DLP technologies have been around for a long time, and while they have made some incremental improvements over the years, no one really loves their DLP. It’s one of those things that most organizations either learn to tolerate or give up on altogether. The reasons for this are well-known - they require lots of work to find and classify the data to be protected. They’re inaccurate and prone to both false positives and false negatives. They introduce friction and frustration for end users. The list goes on.
Cyberhaven tackles DLP in a new way. But instead of diving into acronyms and data science jargon, I thought it might be more fun to describe what makes Cyberhaven unique by looking at the state of DLP through the lens of Christopher Nolan’s new classic, Memento. In short, we are going to take a look at all the problems you can solve when your DLP has a perfect memory, and also, all the things that can go wrong if it doesn’t.
Making Decisions Without a Memory
To set the stage, Memento tells the story of a man who has lost the ability to form new memories. Lenny knows who he is, but every few minutes his memory fades and he wakes up with no context of where he is, why he was there, or what he was doing. To compensate, he constantly tries to give himself reminders by taking Polaroids and writing notes to himself. And by the way, he is armed and looking for vengeance against those who have wronged him. Without giving away too many spoilers, let’s just say his system is...prone to errors.
Most DLPs today are in a very similar situation. Everytime a user sends a file, the DLP wakes up with no history or context about the file. Like Lenny with his Polaroids and notes, the DLP can only rely on signatures or rules to decide what to do next. Does this file look like the one that I’m looking for? It sort of looks the same, but not quite. Should I let it through or block it?Of course there are many ways a DLP can analyze content ranging from simple signatures (e.g. 16 numbers must be a credit card number) to custom-defined rules or natural language processing (NLP) and machine learning. But regardless of how the analysis is performed, a common problem remains - sensitive content can look very much like regular content and vice versa. Is that presentation a highly sensitive product roadmap or just the plan for the holiday party? When the DLP wakes up with no memory and only a fixed set of content rules, mistakes are invariably going to be made. And those mistakes translate to disrupted work, more work for staff and managers, or ultimately the loss of sensitive data.
Building a DLP With Context and Perfect Recall
Cyberhaven gets to the heart of the problems that have plagued traditional DLPs for years. One of the most important differences lies in the way the solution classifies and tracks sensitive data in the first place. Unlike a traditional DLP, Cyberhaven has a perfect memory of all of an organization’s data - how it was created and every time it was subsequently shared, copied, or modified including by whom and over what application.
This immediately brings a wealth of additional context that we can use to make better DLP decisions. For example, data provenance can tell us a great deal about what kind of data is involved and how sensitive it is. For example, consider an employee uploading a file to their personal Google Drive. Even though the file was passed through many hands before it got to him, Cyberhaven can tell that the content was originally pulled from the company’s Git repository and likely contains proprietary source code. Or that the data that a user is about to copy/paste into a chat application was originally created by the CFO and downloaded from a shared drive of company financial data. This long-term memory of where data comes from and who has interacted with it gives us a much clearer picture of what is sensitive and what it is not. So instead of relying on one of Lenny’s post-it notes, we can make our decisions while knowing the full history of every piece of data. When we combine this with content analysis, we suddenly have a far more accurate picture of what to do next.
Secondly, this approach turns another aspect of DLP on its head. Instead of relying on staff or users to do the hard work of finding, defining, and tagging all the data that needs to be protected, Cyberhaven proactively finds where all sensitive enterprise data resides in the enterprise across endpoints, USB drives, applications, and the cloud. Since all data is automatically traced, the solution can find sensitive data in unexpected places that would normally be missed.
When Securities Collide
Our long-term memory analogy also applies to another big problem facing DLP today. As discussed earlier, most DLPs have all their eggs in one basket of content analysis. But what happens when the DLP can no longer see into the content? The DLP is now like Lenny waking up in a strange place but now he has no photos or notes to guide his decisions. He’s paralyzed.
Applications of all types, such as WhatsApp, Zoom, and Box, increasingly encrypt content from end to end. Sensitive files may be encrypted, password protected, or sent in protected archives. Ironically, these and other protections that an organization can use to make their data safer can actually break their DLP functionality. Cyberhaven’s complete history and context means that the DLP continues to do its job even if the content is obscured. From a security perspective this means that organizations can pursue defense in depth for their data with the knowledge that one set of security controls won’t break one of the others.
Of course, this same issue applies to users who may try to intentionally evade a DLP. For years, end users have learned that if they get blocked while sending a file, they can just zip the file and it will typically go through without a problem. The same process with a password on the zip file could allow a malicious insider to evade the company DLP with virtually no effort or skill required.
The challenges of DLP have existed for years and are well-known by anyone who has had to implement it. To solve these challenges, the industry has typically resorted to two things - increasingly complex rules and more flavors of content analysis. However, if the problem is that you have all your eggs in one basket, the solution is not to simply build a deeper basket. For DLP the problem is that the technology has consistently tried to solve highly nuanced problems while only analyzing the problem from a single perspective. By analyzing additional contexts and continuously tracking them over time, we can make real-time policy decisions that are truly informed. And this can keep our data safe without subjecting our employees, partners, and customers to those unfortunate Memento-esque mistakes.