Skip to content

Cyber Insurers: Stop Getting Ripped Off on Post-Breach Data Mining

Canopy Team November 29, 2022
Canopy logo with title text that says Cyber Insurers: Stop Getting Ripped Off on Post-Breach Data Mining


Cyber insurers, do your data mining panel providers use ediscovery tools & techniques for post-breach incident response & PII/PHI review? If yes: You need to read this post, because you’re probably overpaying on cyber liability claims to the tune of millions per year.


The Total Cost of Post-Breach Data Mining

When it comes to data mining, many old-school panel providers talk loosely about finding personally identifiable information (PII), protected health information (PHI), and other types of personal data. They sometimes even offer “free” initial culling to make themselves attractive to insurers.

But the initial culling is just one of three main phases encompassed by incident response data mining. In order to evaluate total project cost holistically, it’s crucial to understand each of these phases:

  • Initial Culling — Searching for PII/PHI/personal data to deliver a set of potentially sensitive documents (called the review population) that require human review.
  • PII/PHI Review — Manually looking at each document in the review population for PII/PHI, and connecting each element of personal data to a person.
  • Entity Consolidation — Deduplicating all of the people discovered in the PII/PHI Review to deliver a consolidated list of affected individuals.

Want to learn more about the most efficient approach to post-breach data mining? Download The Claims Manager's Guide to Data Breach Response!

According to NetDiligence’s 2022 Cyber Claims Study, forensics (including data mining) has taken up an average of 53% of crisis services costs over the past five years. So how the initial culling is done has a big impact on overall claims costs — especially if the provider is using ediscovery tools and techniques.


The Initial Culling's Trickle-Down Effect

Many well-known data breach response providers use an ediscovery approach for the initial culling, which relies heavily on keyword searching and regular expressions (regex). This results in two major consequences:

  1. Over-Inclusivity — Because ediscovery tools and techniques are notoriously bad at identifying personal data, data miners set wide parameters. This leads to significantly larger-than-necessary review populations.
  2. Under-Inclusivity — At the same time that they are flagging non-sensitive data, ediscovery tools and techniques tend to miss actual personal data. This unreliability can push forensics leads to send the entire data set off for manual review to minimize the risk of missing important information.

The initial culling phase requires much less time and manpower than the subsequent PII/PHI review. So from the perspective of data mining providers, it’s pretty low-risk to offer culling as a complimentary service — which many do, knowing that they can then bill for humans to review hundreds of thousands or even millions of unnecessary documents due to overly inclusive review populations.

And because the ediscovery approach was developed for finding legally responsive documents (not a list of people & PII/PHI), it can lead to some unpleasant surprises for data breach response. Ediscovery review tools are slow & inefficient at this process (as they weren’t designed for it), documents aren’t handled in the right order, and even the reviewers themselves can have the wrong skillset. All of this results in frequent time & cost adjustments in the final hour, when it’s too late to change course.

In short: the cost of “free” culling quickly adds up when providers use old-school techniques & tools. And insurers end up footing the bill, potentially paying millions of dollars more per year on data mining.

Using ediscovery tools for PII review can cost cyber insurers tens of thousands per incident.

Download our white paper: The Inflated Cost of Data Breach Response (And How We Got Here) to learn more.

Then Why Do Some IR Folks Use Ediscovery Tools & Techniques?

When the risk of cyber incidents started to grow, purpose-built software for identifying PII/PHI/personal data and connecting it to affected people didn't exist. So cyber experts borrowed software and methodologies from the closest adjacent market: ediscovery. 

But ediscovery tools and processes didn’t perfectly align with incident response, so teams had to make a lot of adjustments to fit their needs. They cobbled together culling strategies using keyword searching and regex. They shoehorned legal review software into a PII review workflow. And they consolidated lists of people using Excel or custom objects. It was manual, repetitive, time-consuming work, but it was the best option at the time.

By the time data breach response software came into the picture, the biggest providers had built businesses, made staffing decisions, and based revenue projections around these adapted solutions. They were seeing tremendous profitability reviewing tons more documents than necessary, and a purpose-built tool would have disrupted that. So they continue to operate in a way that profits themselves — at the expense of cyber insurers.


Purpose-Built Software Is Changing the Game

Canopy launched the world’s first (and still only) software built exclusively for data breach response in 2018, giving progressive incident response teams the tool they needed to disrupt the data mining scene. Using Canopy’s software, they’ve developed a faster, more streamlined, and overall cheaper approach to tackling every phase of data breach response.

To start, Canopy’s tool leverages machine learning for significantly faster and more reliable PII, PHI, and personal data identification. It’s defensible, too, with algorithms that continuously learn from billions of detected personal data elements (and counting). This means that the initial culling can often be completed by one person in one day. And more importantly, the results are significantly more accurate, so reviewers are looking at documents that actually contain personal data instead of wasting their time with documents that don’t.

Initial culling has a huge impact on the total cost of data mining. See how one of our partners saved its client over $300,000 on a single project using Canopy.

Our team also understands the significant differences between PII/PHI review and ediscovery review. So in addition to focusing reviewers’ efforts with improved culling functionality, we’ve built AI-powered review workflows to further speed up their process. These workflows largely take the guesswork out of review, turning the linking of personal data and people into more of an accept-or-reject process.

Download Wotton + Kearney's case study to see the power of Canopy's AI-driven review workflows.

AI further benefits the data breach response process through entity consolidation. Instead of building custom object/SQL databases or manually dealing with spreadsheets, a data breach response expert can automatically deduplicate identical people with one click in Canopy. The application even suggests entity relations, as might be seen with nicknames or maiden names, making it easy to merge them (and all of their personal data).

How do you consolidate billions of entity records within HIPAA-mandated breach notification timelines? Download our case study to find out. 

Reassess Your Panel to Save Millions per Year

How do your panel providers approach each phase of data mining? If their process and tools are influenced by ediscovery, then odds are you’re spending way too much servicing cyber liability claims. By trading in ediscovery tools for Canopy's Data Breach Response software, incident response teams can complete projects much faster at a fraction of the cost.

Visit Canopy's Partners page to find data mining providers that are already leveraging purpose-built data breach response software and a more streamlined workflow to deliver better results for breached companies and their cyber insurers.