Securing Source Code Requires Securing the Commit…and a Whole Lot More

Howard Ting

CEO

January 27, 2021

•

1 min

Updated:

March 21, 2025

In This Article

Example H2

Source code is highly dynamic unstructured data that is constantly being modified by scores of users. It is almost impossible to control using traditional DLP regex signatures and tagging. Learn about the challenges of protecting source code and how to prevent future incidents like the recent Solar Winds attack in this blog.

Source code is arguably every organization’s most valuable intellectual property — and also one of the hardest to protect. As businesses of all types are increasingly defined by their software, protecting source code is often equivalent to protecting the business itself. A source code breach could mean the loss of your primary competitive advantage or exposure of your proprietary business logic to attackers and competitors. Software is also one of the strongest bonds of trust between an organization and its customers. Vulnerabilities can directly put customers at risk, and compromises can irreparably damage relationships. With direct access to source code, it is far easier for attackers to find these all-important vulnerabilities and, in some cases, even introduce threats directly into the code.

Unfortunately, these crown jewels are at risk from both internal sources and an ever-growing list of external threat actors. However, most organizations have an important gap when it comes to protecting their code from being compromised. Development tools and pipelines are designed to maximize productivity and thus foster a culture of open access, which typically results in many developers having full access to source code. And while development tools have robust checks and balances to control what gets committed, they don’t keep code from getting lost or exposed. On the other hand, traditional security tools such as DLP have always had problems when it comes to identifying and protecting proprietary source code, and those problems have only gotten worse with the rise of open-source software (OSS).

With this in mind, let’s take a look at some of the new threats to source code, the challenges they pose, and what organizations can do about it today.

Threats: It’s not Just about Industrial Espionage Anymore

Source code is at risk from an increasingly diverse set of threat actors, often with very different motivations. Traditionally, industrial espionage has been the prime driver behind source code theft, and it remains a serious risk today. Rival companies and state-sponsored actors have targeted source code from virtually every industry in order to quickly develop competitive products.

The scale of the problem is often hard to comprehend. The most recent update from the Commission on the Theft of American Intellectual Property estimates the cost of U.S. intellectual property theft at between $180 billion and $540 billion. Attackers will often recruit employees within an organization to facilitate the theft, meaning organizations must be prepared for insider threats as well as external attackers.

However, today industrial espionage is just the tip of the iceberg when it comes to source code. Attackers can steal and analyze source code in order to find vulnerabilities that can be used in later attacks. Or, as in the case of the recent SolarWinds attack, attackers can target source code as a way to deliver malicious code to downstream customers. Attacks like these represent not just a loss of intellectual property but a direct threat to an organization’s business.

This is significant because enterprises must consider a very different set of threat actors when source code is targeted as part of a larger attack chain as opposed to pure IP theft. Specifically, an organization’s source code is increasingly in the crosshairs of APT groups, malware and exploit kit developers, and a wide range of financially motivated criminal groups. And while source code theft is not the end goal of these actors, it is a prerequisite for other threats they pose. Stealing source code allows attackers to methodically and patiently analyze it for security gaps and ways to exploit potential problems. Even in the case of a SolarWinds style of attack, adversaries will need to closely analyze code to understand how to properly insert malicious code that is both functional and blends in with valid code to be more difficult to identify.

All of these threats can pose an existential risk to the business. The loss of IP to competitors or foreign nations can naturally mean a loss of market share. However, as the SolarWinds attack shows, even this loss pales in comparison to the potential business damage if a firm’s source code puts its customers at risk. To avoid these scenarios, organizations simply must be able to ensure that their critical code isn’t leaked.

Challenges in Protecting Source Code

Development pipelines naturally focus on how code flows in one direction — the build. Commits, builds, and testing are all tightly controlled. However, development tools don’t and shouldn’t be expected to address the risks of code flowing away during the development and build process. Just like any other intellectual property, source code is prone to sprawl. Developers may copy code, share it with others, or use the same code across a wide range of applications and tools. Developers usually have considerable autonomy over their setups, with each one typically having a unique set of preferred tools. This can make it easy for source code to end up in unexpected places or in a variety of third-party development tools. At that point, anyone with read access to that source code can be the first step in a breach.

Most security tools often don’t fare much better. Source code is highly dynamic unstructured data that is constantly being modified by scores of users. As such, it is almost impossible to control using traditional DLP regex signatures and tagging. Instead, security tools typically try to identify source code by other forms of pattern matching, with some experimenting with using machine learning. These methods are extremely prone to false positives and errors. To make matters worse, even advanced DLP tools fail to reliably distinguish between an organization’s proprietary code and the massive amounts of open-source code that is used in modern development processes. Developers need to be able to use and share OSS code freely, and false positives from DLP tools can add tremendous friction to an agile development process.

Additionally, DLP tools are often only applied on egress and need the content to be in the clear for inspection. This presents a few problems when it comes to source code. First, it misses the internal sprawl of code and risky or anomalous sharing, which can drastically increase an organization’s attack surface and potential for loss. Secondly, an attacker or malicious insider could intentionally obscure source code before exfiltration by encrypting the code or sending it over an encrypted application.

New Approaches and Best Practices

Data-tracing solutions such as Cyberhaven give organizations new options for managing the risks to their source code. The technology automatically tracks the full lineage of data across users, devices, and applications, including cloud-based assets. This means R&D, security, and IT teams can always see if source code has been copied or resides in unexpected locations.

With Cyberhaven, code can be classified based on its source, which can allow teams to reliably distinguish between open-source components from GitHub and proprietary source code developed in-house, dramatically reducing false positives. The technology can also monitor any application or action that occurs on a user device that could create risk, such as copy/pasting code into personal email, instant messaging apps, or encryption software, and likewise identify actions that might be used to obscure source code from security, such as renaming files or encrypting or compressing the content.

These new perspectives can allow organizations to automatically check for a variety of risks that often go unnoticed. For example:

Identify any secondary users with access to source code. Have developers shared source code with other users in the organization? What devices or file shares contain source code, and who has access to those systems?
Identify any unexpected apps or storage that contain source code. Was source code accidentally backed up to a user’s personal cloud? Was code copied to unmanaged machines, unapproved cloud services, third-party CI/CD systems, etc.? Has source code been copied to USB drives?
Track how source code is shared with contractors and partners. Who has code been shared with? Was it over appropriate apps and channels? Do third-party developers still have access to source code even after a project is completed?

These are just a few examples of the ways that source code can end up in unexpected places. And naturally, the more broadly code is shared, the greater the risk for that code to be lost. By gaining visibility and control over code, organizations can take easy, repeatable steps to mitigate that risk. And as threats continue to evolve, protecting the source code can mean protecting the business.

To learn more about how Cyberhaven can help protect your intellectual property, schedule a demo to see the product in action.

Securing Source Code Requires Securing the Commit…and a Whole Lot More

Threats: It’s not Just about Industrial Espionage Anymore

Challenges in Protecting Source Code

New Approaches and Best Practices

Data Security & Data Management Frameworks

ITAR Compliance: What It Is and How to Meet Its Requirements

What Is Metadata? Definition, Types, and Security Risks Explained

See

Learn

Meet

Connect with Cyberhaven