Refactor
The Assessment Phase represents the critical “Scoping” bridge of the IRDM framework. It transforms raw processed data into a defensible notification roadmap by narrowing the focus to documents highly likely to contain qualified personal data.
The primary objective of Assessment is to transform raw processed data into a defensible notification roadmap by achieving four core outcomes:
- Resolution of Technical Debt: Manually resolving “Search-Blind” files (handwriting, complex DBs, low-confidence OCR) to ensure a 100% reliable dataset for the review team.
- Defensible Culling: Systematically reducing the active review volume through validated, iterative sampling of both the included and excluded sets.
- Entity Forecasting: Predicting the volume and complexity of non-unique qualified entities to assist in resource planning and timeline estimates.
- Protocol Finalization: Transitioning from a broad legal hypothesis to a stable, signed-off Review Protocol that governs the next phase.
- Mining Trigger Validation: Proving through data testing exactly which element combinations (e.g., Name + SSN) mandate a notification entity.
- Jurisdictional Mapping: Validating the geographic spread of the affected population to confirm specific state, federal, or international legal obligations.
Before initiating scoping searches, the Assessment Lead must verify:
- Exception Audit: All “Processing Warnings” (decrypted files, OCR’d images) are accounted for in a manual review plan to ensure no “black holes” exist.
- Exclusionary Rules: Confirmation that Date Ranges, Excluded Folders, and Metadata Culling have been successfully applied.
- Infrastructure Readiness: AI-driven personal data detections (Pulse Checks) and Image Classification tags are filterable and visible within the module.
- Initial Protocol Draft: A preliminary list of Primary and Secondary elements is available to guide initial searches.
- Project Status: Documents are tagged as “Ready for Assessment” to enable phase-specific tools.
The objective is to ensure the dataset is “Search-Reliable” by resolving blind spots.
- Manual Triage: Inspect items that failed automated processing (e.g., corrupt files or password-protected docs).
- Specialized Examination: Review “Search-Blind” files such as databases (SQL/MDB), audio/visual transcripts, and images without text using Image Classification tags.
- OCR Validation: Sample documents with low OCR confidence to verify that sensitive data like account numbers remain legible.
The objective is to test geographic and data assumptions to establish a defensible Protocol.
- Residency Testing: Run targeted searches to identify the geographic spread of individuals, validating jurisdictional assumptions.
- Orphaned Data Analysis: Identify “Dependent” data (e.g., medical diagnosis) appearing without “Primary” elements (e.g., names) to determine handling rules.
- Known Entity Analysis: Compare client-provided lists (employees/customers) against detected hits to quantify the “Unknown” population.
The objective is to cull the active set down to high-probability documents through an iterative loop.
- Sweep & Tag: Automatically tag documents containing Primary/Secondary pairs as
ready_for_review. - Statistical Sampling: - False Negative Check: Sample the “Out of Scope” set to ensure no qualified data was missed.
- False Positive Check: Sample the “Ready for Review” set to ensure AI noise isn’t overwhelming the team.
- Finalize Triggers: Finalize the specific legal conditions that mandate the creation of a non-unique qualified entity.
- Assessed Auto-Review Set: Documents confirmed for automated/keyword assessment.
- Manual Review Tagged Set: Specialized set for human extraction (handwriting, complex DBs).
- Manual Triage & Exception Log: Defensive justification for all unprocessable data exclusions.
- Finalized Review Protocol: Signed-off definitions of Primary, Secondary, and Dependent elements.
- Raw Entity Forecast: Statistical summary of expected volume, complexity, and geographic distribution.
The Assessment Phase concludes only when the Review Set is finalized and the Review Protocol is finalized.
- Review Set Locking: All documents tagged as
ready_for_review,manual_review_required, orout_of_scope. - Statistical Sign-off: Final sampling report confirms zero False Negatives in the excluded set at the established confidence interval.
- Counsel Approval: Formal sign-off on the Mining Triggers and Entity Layout.
- Readiness for Mining: Training materials delivered to the review team; Canopy project status set to “Ready for Review.”
Next Phase: Review (The Mining)