Security best practices

11/18/2025

Minute Read

Insider Threats in the Age of AI: What Security Teams Need to Know

Bruce Chen

Guest Contributor

Director of Product Marketing

Artificial intelligence has gone from buzzword to business tool almost overnight. Employees are rapidly adopting platforms like ChatGPT, Gemini, and Copilot to draft content, analyze data, brainstorm code, and accelerate productivity. But as AI becomes embedded in everyday workflows, a new category of insider threat is emerging—one that is harder to detect, harder to classify, and potentially more damaging than anything security teams have faced before.

In this article

While most employees use AI tools with good intentions, the simple act of pasting sensitive data into a chatbot can lead to serious exposure. Intellectual property, customer records, source code, and strategic plans are all at risk. The challenge for security teams is clear: how do you maintain innovation without opening the door to accidental data leaks and regulatory risk?

The Rise of AI in Everyday Workflows

Just a few years ago, AI tools were the domain of data scientists and specialized developers. Today, they're accessible to everyone with a browser. Artificial intelligence adoption has exploded at an unprecedented speed.

According to our Cyberhaven Labs' Q2 2025 AI Adoption & Risk Report, which analyzed usage patterns from 7 million workers across industries, AI usage grew 4.6x in the past year and an astonishing 61x over the past 24 months—representing one of the fastest technology adoption rates in workplace history. In technology companies, 38.9% of employees now actively use AI tools, while the manufacturing and retail sectors saw dramatic 20x and 24x growth, respectively.

The numbers paint a clear picture of rapid adoption. ChatGPT alone reached 800 million weekly active users globally as of July 2025.

The appeal is obvious: AI increases efficiency, supports creativity, and allows employees to move faster. But with this power comes a new type of data movement—unstructured, user-initiated, and largely invisible to traditional security tools. Sensitive business data is being copied and pasted into tools that may store prompts, use them to train models, or expose them to external systems. And in many organizations, there are few—if any—guardrails in place.

How AI Introduces New Forms of Insider Risk

Unlike traditional insider threats, which often involve intentional sabotage or data theft, AI-related risks are usually unintentional. Research from Cyberhaven Labs analyzing 1.6 million workers found that 8.6% of employees have pasted company data into ChatGPT, with 11% of all data pasted being classified as confidential. More concerning, 4.7% of employees have pasted sensitive data, including source code, client information, and strategic documents.

An employee might paste source code into a chatbot to get help debugging it, unaware that the model might retain that code. A finance team member might paste unreleased earnings numbers into a tool to summarize them for a board report. These aren't acts of malice—but the consequences can be just as severe.

The nature of AI interactions makes them difficult to control. Prompts are often ad hoc, embedded in browser sessions, and not tied to specific files or systems. Data lineage—the ability to trace data from its creation through its full lifecycle—becomes crucial in understanding and preventing these risks. This creates a unique challenge: the data being exfiltrated isn't always a downloadable document—it's snippets of highly sensitive content that are easy to lose track of and difficult to audit.

The scope of data exposure is alarming. Our Q2 2025 analysis found that 34.8% of all corporate data employees input into AI tools is classified as sensitive—up from 27.4% just a year ago and more than triple the 10.7% observed two years ago. The most common types of sensitive data going into AI include source code (18.7%), R&D materials (17.1%), and sales and marketing data (10.7%). Even more concerning: health records comprise 7.4% of sensitive AI inputs, and HR/employee records account for 4.8%.

Worse still, once data has been submitted to an AI model, organizations may lose all control over it. Depending on the platform, that data could be stored indefinitely, reviewed by humans, or used to train future versions of the model. The result is a silent form of data leakage that leaves no trace—unless you're looking in the right place.

Why Traditional DLP Can't Detect AI-Related Data Leaks

Most legacy Data Loss Prevention (DLP) tools were built to scan files and monitor known exfiltration channels like email, USB drives, and file-sharing services. According to the National Institute of Standards and Technology (NIST), traditional DLP focuses on content types and metadata but struggles to understand the context in which data is used.

When an employee copies sensitive content from a PDF and pastes it into a browser-based AI tool, there's often no file movement, no policy match, and no alert. Even newer endpoint DLP solutions can struggle in this scenario. They may recognize that a copy-paste event happened or that a browser session is active—but they don't know where the data came from or whether the action was risky. They lack the ability to connect user behavior with data origin, which is essential for identifying AI-related threats.

This gap is a major reason why many security teams remain blind to how their data is being used in AI tools. Unless a specific tool is blocked entirely, there's often no visibility at all. And outright blocking isn't a scalable solution—AI is becoming a business enabler, and organizations that fail to adopt it risk falling behind.

Real-World Examples of AI-Related Insider Incidents

The risks of AI misuse are not hypothetical. Several high-profile companies have already experienced real incidents:

Manufacturing Industry Incident

Engineers at a global manufacturing firm used ChatGPT to speed up technical documentation
They unknowingly pasted proprietary product designs and CAD file content into the tool
The company only discovered the issue after reviewing unusual network traffic
By then, the data had been retained by the AI provider

SaaS Company Data Exposure

Marketing employees at a SaaS company used generative AI to create customer presentations
They fed sensitive client data—including names, email addresses, and internal sales notes—into the AI tool
The prompt history was later exposed due to a vulnerability in the provider's platform
The company faced a public relations crisis

Healthcare HIPAA Violation

A healthcare researcher inadvertently shared protected health information (PHI) with an AI model while summarizing patient survey results
The organization faced compliance exposure under HIPAA
They were required to notify affected individuals despite no malicious behavior occurring

Samsung's Corporate Ban

In May 2023, Samsung banned employees from using public AI tools after employees copied confidential code to ChatGPT
This incident prompted a company-wide policy change on AI usage

Credential Compromise at Scale

Between January and October 2023, over 225,000 sets of OpenAI credentials were discovered for sale on dark web marketplaces
The credentials were stolen by infostealer malware, with LummaC2 being the most prevalent variant
There was a 36% increase in leaked credentials between the first and last five months of 2023

These examples illustrate just how easy it is for sensitive data to leak via AI—and how hard it is to catch without the right tools in place.

The Financial Impact of AI-Related Data Breaches

The emergence of AI as a workplace tool has introduced a new category of data breach with substantial financial implications. According to the IBM Cost of a Data Breach Report 2025, 13% of organizations reported breaches of AI models or applications, marking the first time AI-specific security incidents have been studied in this depth.

The costs are striking. Organizations experiencing security incidents involving shadow AI—unauthorized AI tools used without IT approval—faced an additional $670,000 in breach costs compared to those with low or no shadow AI usage. This brings the average breach cost for organizations with high shadow AI levels to $4.63 million, compared to the global average of $4.44 million. The additional costs stem from longer detection and containment times, as these incidents took approximately a week longer than the global average to resolve.

How Cyberhaven Detects Sensitive Data Misuse with AI

Cyberhaven addresses this challenge with a fundamentally different approach to data protection: data lineage. Rather than just scanning content or enforcing static rules, Cyberhaven traces the full journey of data—from its creation to its final destination. This means that if someone pastes a sensitive snippet into ChatGPT, Cyberhaven knows where that data came from, how it was classified, and whether the action violates policy.

The platform captures user actions across endpoints, browsers, and SaaS apps, correlating them with data movement and origin. So if a user copies financial projections from a confidential Excel file and pastes them into an AI prompt, Cyberhaven alerts the security team immediately. Not only that—it provides full context, showing exactly what was copied, from where, by whom, and into what platform.

Cyberhaven also allows organizations to define AI-specific policies. You can monitor or block data being pasted into AI tools, create exceptions for trusted use cases, or implement step-up enforcement based on the sensitivity of the data involved. This level of precision means you can embrace AI while still protecting your most critical assets.

Building Guardrails for Safe AI Usage at Work

Security doesn't have to come at the cost of innovation. By putting the right guardrails in place, organizations can empower employees to use AI responsibly without risking data loss. This starts with awareness—educating employees on what types of data should never be shared with external models. But awareness alone isn't enough.

You also need real-time visibility into how sensitive data is used, and the ability to respond to misuse immediately. That's where Cyberhaven provides a unique advantage: it sees what traditional tools can't and gives security teams the context they need to act with confidence.

The age of AI is here. It's changing how we work—and how we think about insider risk. With the right tools and strategy, you can embrace the opportunity without opening the door to avoidable threats.

Frequently Asked Questions (FAQ) on AI Insider Threats

What are AI insider threats?

AI insider threats occur when employees—usually unintentionally—expose sensitive corporate data by pasting it into AI tools like ChatGPT, Gemini, or Copilot. Unlike traditional insider threats that involve malicious intent, AI-related threats typically stem from employees trying to be more productive without realizing they're creating security risks.

How does AI cause data leaks?

AI causes data leaks when employees copy and paste sensitive information—such as source code, customer data, financial information, or proprietary designs—into AI chatbots for assistance. Once submitted, this data may be stored by the AI provider, used to train models, or exposed through platform vulnerabilities. Because these interactions happen in browsers and don't involve traditional file transfers, they often bypass conventional security controls.

Why can't traditional DLP tools detect AI data leaks?

Traditional Data Loss Prevention (DLP) tools were designed to monitor file movements and known exfiltration channels like email, USB drives, and file-sharing services. They struggle with AI-related leaks because there's often no file movement—just copy-paste actions in a browser. Legacy DLP tools can't trace where the copied data originated from or understand the context of the action, making it nearly impossible to identify when sensitive information is being shared with AI platforms.

What is data lineage and why is it important for AI security?

Data lineage is the ability to trace data from its creation through its entire lifecycle, including all transformations, movements, and uses. In the context of AI security, data lineage enables organizations to understand not only what data was fed into an AI tool, but where it came from, who accessed it, how it was classified, and whether the action violated security policies. This contextual understanding is critical for detecting and preventing AI-related data leaks.

How much do AI-related data breaches cost companies?

AI-related data breaches are among the most expensive security incidents organizations face. Malicious insider attacks (which include unintentional AI data leaks) cost an average of $4.92 million per incident in 2025. In the United States specifically, data breaches average $10.22 million. Organizations experience an average of 13 insider-related incidents per year, with insider-led cyber incidents costing $17.4 million annually.

How can organizations protect against AI data leaks while still enabling productivity?

Organizations can defend against AI data leaks by implementing data loss prevention solutions that understand data lineage and context, not just content. This includes: deploying tools that trace data from creation to destination (such as Cyberhaven); creating AI-specific policies that monitor or block sensitive data from being pasted into AI tools; educating employees on safe AI usage practices; and establishing clear guidelines on what data can and cannot be shared with AI platforms.

What percentage of employees are using AI at work?

As of 2025, 54.6% of U.S. workers use generative AI, with usage increasing by 10 percentage points in just 12 months. ChatGPT leads adoption, accounting for 77% of all online LLM access, and has reached 800 million weekly active users globally. Over 92% of Fortune 500 companies have employees using ChatGPT, indicating widespread enterprise adoption across industries.