Data Mining for Protected Data with Canopy
19 October, 2020

Data Mining for Protected Data with Canopy

Canopy Software recently released a case study highlighting how our data mining innovations decrease the time our customers spend on protected-data discovery and data breach response. When this customer’s data mining process left them with ~90,000 documents to review, they enlisted Canopy to help further reduce the review population.

In this case, Canopy’s innovative methods of data mining for Personally Identifiable Information (PII) and Protected Health Information (PHI) improved our customer’s results by approximately 78%. This was achieved in only a few hours by one data miner using our software! What made this possible? Here, we spotlight a few components that routinely impress the expert data miners who use our system.

First, the workflow enabled by Canopy’s tools is unparalleled. “Data mining with Canopy is intuitive, streamlined, and efficient,” said Alanna Dent, Canopy’s former Data Mining Analyst who worked with the customer highlighted in the case study. “I was able to effectively data mine the full document set on my own in approximately 15 hours with no prior knowledge of the case.”

Second, this speed was enabled by groundbreaking machine-learning-based features, such as Canopy’s Classification tools. Alanna trained our customer to use these tools, which helped them quickly reduce their review population. “My favorite tools in this case,” she explained, “were Document and Image Classifications, which allow files to be sorted into and out of review in bulk - drastically reducing time and stress.”

Canopy’s Classification tools allow data miners to quickly sort documents based on topic, document type, and other characteristics. Users who are data mining for PII can use Classification tools to automatically group an entire set of images, such as driver’s licenses, that may contain PII. They can then view the group of images together in order to visually, quickly verify that the images should be swept into the review population.

Third, while Canopy’s Classification tools significantly reduce the time spent on a project, they are supported by the overall speed-focused design of the Canopy system. With features ranging from protected-data detection during processing to tracking the accuracy of data mining, the entire Canopy Protected-Data Discovery system is designed to make data breach response and data mining for PII and PHI faster.

As a former protected-data researcher and Data Mining Analyst, Alanna gained first-hand experience with the overall impact of the Canopy system’s tools and features. Drawing on this experience, she said, “Canopy's multitude of machine-learning-based technologies work together to form a cohesive product that has changed the landscape of data breach response.”

“Canopy's multitude of machine-learning-based technologies work together to form a cohesive product that has changed the landscape of data breach response.”

Interested in learning more about data mining with Canopy? We welcome you to read the case study mentioned in this article.

Canopy’s Protected-Data Discovery system is proven to help teams achieve higher accuracy and faster review speed with less effort. To schedule a demo of Canopy’s Protected-Data Discovery system, please contact us.


  •   October 19, 2020
  •   Sophia Rosman
  •   Data Breach
  •   Reading Time 3 min
  • Share on: