Refactor DRAFT
The Canopy Incident Response Data Mining (IRDM) Framework operates on two simultaneous tracks: the Data Track (Volume) and the Legal Track (Application). This dual-vector approach systematically narrows a chaotic dataset into a defensible, finalized notification set.
| Phase | Data Track Action (Volume) | Legal Track Action (Application) |
|---|---|---|
| 1. Processing | Normalize: Intake, deduplication, and indexing. | Strategize: Establishing jurisdictional hypotheses. |
| 2. Assessment | Search: Identifying high-probability documents. | Scope: Defining “qualified” data with Counsel. |
| 3. Review | Mine: Extracting granular data fragments. | Finalize: Locking the defensible protocol. |
| 4. Consolidation | Deduplicate: Merging into unique entities. | Qualify: Applying legal notification bars. |
Goal: Eliminate “Technical Debt” and establish the search-reliable index.
- Legal Track: Counsel establishes a broad jurisdictional view (Residency Mapping, Harm Thresholds, and Regulatory Timing).
- Data Track: AI-driven detection runs across a deduplicated, De-NISTed dataset. Analysts identify known client data patterns (e.g., specific Employee ID formats).
Goal: Narrow the document universe to a defensible “Review Set.”
- Legal Track: Analysts and Counsel define the Review Protocol—the rules governing what data is “qualified.”
- Data Track: Targeted searches filter the dataset to documents potentially containing non-unique qualified entities.
Goal: Granular extraction of PII/PHI.
- Legal Track: The protocol is Finalized. Any mid-stream changes require a return to “Second-Pass Assessment.”
- Data Track: Reviewers (GenAI or Human) extract specific data elements and link them to individuals per the protocol.
Goal: Produce the final, unique notification list.
- Legal Track: Application of the Qualification Test against jurisdictional thresholds (e.g., 500+ residents).
- Data Track: Entity Management deduplicates the list to ensure each unique person is counted once, regardless of document frequency.
Core Objective: Establish the “Initial Hypothesis” before data intake to optimize configuration.
- The “Plus” Factor: Determining if the data meets the definition of PII (Name + high-risk identifier).
- Jurisdiction Mapping: Evaluating the residency of victims against specific laws (e.g., CCPA, GDPR).
- Safe Harbor Evaluation: Assessing if encryption status removes the obligation to notify.
- Discovery Timing: Identifying regulatory clocks (e.g., 72-hour GDPR windows or 30-day CA requirements).
| Factor | Key Impact on Notification |
|---|---|
| Data Type | Sensitive PII vs. Encrypted data (Safe Harbor). |
| Residency | Determines which state/country statutes apply. |
| Volume | High counts trigger Attorney General/Regulator reports. |
| Risk | “No reasonable likelihood of harm” may waive notice. |
Core Objective: Define the technical boundaries of the dataset.
- Exclusionary Date Range: Culling data outside the breach/remediation window.
- Excluded Folders/Metadata: Applying client-directed exclusions during processing.
- Custom Formats: Identifying company-specific data patterns for specialized detection rules.
- Logistics: Selecting the optimal transfer method (Secure Browser vs. Cloud-to-Cloud).
| Workflow | Objective | Documentation Link |
|---|---|---|
| Intake Integrity | Hashing and Chain of Custody. | Intake Procedures |
| Protocol Sign-off | Finalizing legal triggers with Counsel. | Legal Calibration |
| Entity Management | Deduplicating and Qualifying. | Consolidation Guide |
Next Step: Phase 1: Processing (The Strategy)