Back to Blog
Minute Read

The 10 Most Common Forms of Company Data Employees Steal or Expose

Alex Lee

We analyzed the behavior of 1.4 million employees working in various industries, and found that customer or client data is the most common type of data exfiltrated by insiders.

In this article

High profile insider threats have rocked the world’s most prominent companies in 2022. Earlier this year, a former Apple engineer was found to have AirDropped 24 gigabytes of confidential Apple car prototype designs to his wife’s laptop before joining a competitive startup. A former Qualcomm lead engineer was also accused of stealing “confidential documents, processes, schematics, and diagrams” pertaining to next generation chips and software.

Insider threats are more common than you think, but they’re not all due to malicious employees taking data on their way out the door. Even when someone accidentally exfiltrates important information, the impact can be substantial.  To figure out what types of data are most at risk, we compiled insights from Cyberhaven’s product usage in our 2022 Insider Risk Report. The most affected data types are customer data, source code, and personally identifiable information (PII).

Key findings in the report include:

  • Nearly half (44.6%) of sensitive data that’s exfiltrated is client or customer data, about one-seventh (13.8%) is source code, and 8% is regulated personally identifiable information (PII).
  • Just 17.9% of exfiltrated data are the classic regulated data like PII, PCI, and PHI. Over 80% of sensitive data exfiltrated is harder to identify intellectual property (IP).

The 10 most common data types under threat

At 44.6%, customer or client data is the most common sensitive data employees exfiltrate. Modern enterprises are a goldmine for information about their customers and files from their customers. One possible explanation for why this is the most commonly exfiltrated data type, is that employees may not understand the sensitivity of this information in the same way they do for, say, a secret ingredient, product formula or a medical record.

Source code is the second most common type of data exfiltrated, at 13.8%. Today, companies across verticals including airlines, retail, financial services, and manufacturing all develop their own applications and algorithms, to gain a competitive advantage. Having one’s source code leaked publicly, and/or to a competitor can leave a material impact on their businesses.

Regulated data, including personally identifiable information (PII), payment card information (PCI), and protected health information (PHI) collectively account for just 17.9% of exfiltrated data. This information, which often includes a standard alphanumeric pattern, has historically been easier to classify using software and therefore easier to protect. Our analysis finds that over 80% of exfiltrated data is harder-to-identify intellectual property (IP).

What sensitive data is exfiltrated?

The top 10 types of sensitive data employees exfiltrate includes:  

  1. Client or Customer Data – For example, a spreadsheet exported from Salesforce or NetSuite showing all customers and the dollar amount of recurring revenue they’ve paid over the past year, an M&A target list that a publicly traded client of an investment bank disclosed with the bank’s deal team, etc.
  2. Source Code – For example, the code for a recommendation engine leveraged by a popular social media app to determine what content to show users in their “for you page”, the logic powering how a SMB lender utilizes to determine creditworthiness based on a company’s cash flows, etc.
  3. Regulated Personal Data (PII) – For example, a Illinois customer’s name and mailing address stored by a direct-to-consumer mattress company as part of their order management software, or a Berlin-based user’s date of birth stored by a consumer neobank, etc.
  4. Design files and product formulas – For example, a DWG file exported from AutoCAD with the part designs of a novel satellite in development, the secret ingredients and production process for a world renown carbonated soft drink, etc.
  5. Regulated health data (PHI) – Example: the medical record of a celebrity who was checked into the hospital following a serious car accident, a Microsoft Excel file downloaded from health insurance billing software with patient names and diagnostic codes, etc.
  6. Regulated financial data (PCI) – Example: a consumer’s credit card number stored in an internet provider’s billing app to process recurring payments, a CSV file of new customers containing their bank account and routing numbers, etc.
  7. Sensitive project files – Example: a folder of images taken with the unannounced smartphone that could be analyzed to reveal specifications of the new camera, an unreleased movie stored on a share drive by a production house that makes movie trailers, etc.
  8. Company confidential – Example: an internal report that found use of the company’s supplements could increase the probability of health risks, an email thread between executives discussing how to tackle the impending banning of their service by a local government, etc.
  9. Unreleased or sensitive marketing – Example: an unreleased press release in Google Docs with details about the company’s upcoming new product line announcement, an advertising creative being developed in Figma with imagery of the company’s unannounced partnership, etc.
  10. Employee HR data – Example: a spreadsheet downloaded from SAP SuccessFactors with the salaries of all employees in the department, a document containing details about the planned bonus payouts for executives, etc.

Data security cannot be taken lightly anymore

All enterprises today are leveraging their data to generate more business value. As the world continues to store leverage, and share more data, the risks to their business increase. In order to win customers’ trust, it’s critical that modern enterprises appropriately defend their client’s data, source code, product formulas, and other important information. Securing data will become increasingly complex, as the vectors for data exfiltration will only continue to grow with every new device, software update, file type, and employee. Every company must take data security seriously, as they are just one data leak from being fined by the government or losing their customers’ trust.

Learn the 15 top data detection and response use cases
Download now