Updated March 21, 2023
Since ChatGPT launched three months ago it’s taken the world by storm. People are using it to create poems, essays for school, and song lyrics. It’s also making inroads in the workplace. According to data from Cyberhaven’s product, as of March 21, 8.2% of employees have used ChatGPT in the workplace and 6.5% have pasted company data into it since it launched. Some knowledge workers say that using the tool makes them 10 times more productive. But companies like JP Morgan and Verizon are blocking access to ChatGPT over concerns about confidential data.
The problem with putting company data into ChatGPT
OpenAI uses the content people put into ChatGPT as training data to improve its technology. This is problematic because employees are copying and pasting all kinds of confidential data into ChatGPT to have the tool rewrite it, from source code to patient medical records. Recently, an attorney at Amazon warned employees not to put confidential data into ChatGPT, noting, “we wouldn’t want [ChatGPT] output to include or resemble our confidential information (and I’ve already seen instances where its output closely matches existing material).”
Consider a few examples:
- A doctor inputs a patient’s name and details of their condition into ChatGPT to have it draft a letter to the patient’s insurance company justifying the need for a medical procedure. In the future, if a third party asks ChatGPT “what medical problem does [patient name] have?” ChatGPT could answer based what the doctor provided.
- An executive inputs bullet points from the company’s 2023 strategy document into ChatGPT and asks it to rewrite it in the format of a PowerPoint slide deck. In the future, if a third party asks “what are [company name]’s strategic priorities this year,” ChatGPT could answer based on the information the executive provided.
On March 21, 2023 OpenAI shut down ChatGPT due to a bug that mislabeled chats in user’s history with the titles of chats from other users. To the extent that those titles contained sensitive or confidential information, they could have been exposed to other ChatGPT users.
Identifying what data goes to ChatGPT isn’t easy
The traditional security products companies rely on to protect their data are blind to employee usage of ChatGPT. Before blocking ChatGPT, JP Morgan reportedly couldn’t determine “how many employees were using the chatbot or for what functions they were using it.” It’s difficult for security products to monitor usage of ChatGPT and protect data going to it for two reasons:
- Copy/paste out of a file or app — When workers input company data into ChatGPT, they don’t upload a file but rather copy and paste content into their web browser. Many security products are designed around protecting files (which are tagged confidential) from being uploaded but once content is copied out of the file they are unable to keep track of it.
- Confidential data contains no recognizable pattern — Company data going to ChatGPT often doesn’t contain a recognizable pattern that security tools look for, like a credit card number or Social Security number. Without knowing more about its context, security tools today can’t tell the difference between someone inputting the cafeteria menu and the company’s M&A plans.
Despite some companies blocking ChatGPT, its use in the workplace is growing rapidly
Cyberhaven Labs analyzed ChatGPT usage for 1.6 million workers at companies across industries that use the Cyberhaven product. Since ChatGPT launched publicly, 8.2% of knowledge workers have tried using it at least once in the workplace. Furthermore, 3.1% of employees have put confidential company data into ChatGPT. Despite a growing number of companies outright blocking access to ChatGPT, usage continues to grow exponentially. On March 14, our product detected a record 5,267 attempts to paste corporate data into ChatGPT per 100,000 employees, defined as “data egress” events in the chart below.
Cyberhaven also tracks data ingress such as employees copying data out of ChatGPT and pasting it elsewhere like a Google Doc, a company email, or their source code editor. Workers copy data out of ChatGPT more than they paste company data into ChatGPT at a nearly 2-to-1 ratio. This makes sense because in addition to asking ChatGPT to rewrite existing content, you can simply type a prompt such as “draft a blog post about how problematic ChatGPT is from a data security standpoint” and it will write it from scratch. Full disclosure: this post was written the old fashioned way by a human being. 🙂
The average company leaks sensitive data to ChatGPT hundreds of times each week
Sensitive data makes up 11% of what employees paste into ChatGPT, but since usage of ChatGPT is so high and growing exponentially this turns out to be a lot of information. During the week of February 26 – March 4, workers at the average company with 100,000 employees put confidential documents into ChatGPT 199 times, client data 173 times, and source code 159 times.
A few bad apples?
At the average company, just 0.9% of employees are responsible for 80% of egress events — incidents of pasting company data into the site. The number is still relatively small, but any one of the egress events we found could be responsible for exposing a critical piece of company data. There are many legitimate uses of ChatGPT in the workplace, and companies that navigate ways to leverage it to improve productivity without risking their sensitive data are poised to benefit.