Skip to content

Canopy Achieves “Impossible” Data Breach Response for Hospital Network

The Challenge

A large hospital network experienced a protected health information (PHI) breach, with over 6,600 compromised PDFs — some containing up to 180,000 rows of information and over 150,000 individuals — and densely packed with PHI. Patients’ information was frequently duplicated with different PHI each time. The hospital network needed to work quickly in order to comply with the HIPAA Breach Notification Rule.

The Solution

Canopy’s algorithms recognized the information inside the PDF tables, extracting each data element and transforming them to a structured format. Then, Canopy’s advanced PHI detection algorithms identified each element of PHI. Our machine learning models deduplicated the entities into a list of unique patients and all their PHI, maintaining links to source documents.

"It was not humanly possible for our team to do this — it would have taken a couple hundred reviewers years to complete this project. We can’t even fathom the cost savings."

Project Lead

The Result

  • Automated data extraction from tables in over 6,600 lengthy Crystal Reports (PDFs), some containing over 180,000 rows
  • Deduplicated 4.28 billion entities to just 3 million unique patients, reducing entity list by 99%
  • Enabled hospital network to comply with HIPAA Breach Notification Rule
  • Saved team millions of man hours & completed "impossible" project in 15 days

By the Numbers

document icon


Crystal Reports (lengthy PDFs)

three people icon, two blue and one green

4.28 billion

entities, often frequently duplicated

numbers panel-calendar

15 days

for Canopy to complete the entire project

Get These Results for Your Data Breach Response

Request your personalized demo to see how Canopy's AI-powered solution can transform your workflow.