Refactor

Phase 2: Assessment (The Scoping)

The Assessment Phase represents the critical “Scoping” bridge of the IRDM framework. It transforms raw processed data into a defensible notification roadmap by narrowing the focus to documents highly likely to contain qualified personal data.

1. Phase Goals

The primary objective of Assessment is to transform raw processed data into a defensible notification roadmap by achieving four core outcomes:

Data Track: “Search”

Resolution of Technical Debt: Manually resolving “Search-Blind” files (handwriting, complex DBs, low-confidence OCR) to ensure a 100% reliable dataset for the review team.
Defensible Culling: Systematically reducing the active review volume through validated, iterative sampling of both the included and excluded sets.
Entity Forecasting: Predicting the volume and complexity of non-unique qualified entities to assist in resource planning and timeline estimates.

Legal Track: “Scope”

Protocol Finalization: Transitioning from a broad legal hypothesis to a stable, signed-off Review Protocol that governs the next phase.
Mining Trigger Validation: Proving through data testing exactly which element combinations (e.g., Name + SSN) mandate a notification entity.
Jurisdictional Mapping: Validating the geographic spread of the affected population to confirm specific state, federal, or international legal obligations.

2. Entry Criteria (The Transition Gate)

Before initiating scoping searches, the Assessment Lead must verify:

A. Integrity of the Search Index

Exception Audit: All “Processing Warnings” (decrypted files, OCR’d images) are accounted for in a manual review plan to ensure no “black holes” exist.
Exclusionary Rules: Confirmation that Date Ranges, Excluded Folders, and Metadata Culling have been successfully applied.
Infrastructure Readiness: AI-driven personal data detections (Pulse Checks) and Image Classification tags are filterable and visible within the module.

B. Preliminary Legal Scope

Initial Protocol Draft: A preliminary list of Primary and Secondary elements is available to guide initial searches.
Project Status: Documents are tagged as “Ready for Assessment” to enable phase-specific tools.

3. High-Level Activities

Activity A: Examine Data (Technical Debt Resolution)

The objective is to ensure the dataset is “Search-Reliable” by resolving blind spots.

Manual Triage: Inspect items that failed automated processing (e.g., corrupt files or password-protected docs).
Specialized Examination: Review “Search-Blind” files such as databases (SQL/MDB), audio/visual transcripts, and images without text using Image Classification tags.
OCR Validation: Sample documents with low OCR confidence to verify that sensitive data like account numbers remain legible.

Activity B: Examine Personal Data Detection (Assumptions Testing)

The objective is to test geographic and data assumptions to establish a defensible Protocol.

Residency Testing: Run targeted searches to identify the geographic spread of individuals, validating jurisdictional assumptions.
Orphaned Data Analysis: Identify “Dependent” data (e.g., medical diagnosis) appearing without “Primary” elements (e.g., names) to determine handling rules.
Known Entity Analysis: Compare client-provided lists (employees/customers) against detected hits to quantify the “Unknown” population.

Activity C: Identify Review Set (Culling & Validation)

The objective is to cull the active set down to high-probability documents through an iterative loop.

Sweep & Tag: Automatically tag documents containing Primary/Secondary pairs as ready_for_review.
Statistical Sampling: - False Negative Check: Sample the “Out of Scope” set to ensure no qualified data was missed.
- False Positive Check: Sample the “Ready for Review” set to ensure AI noise isn’t overwhelming the team.
Finalize Triggers: Finalize the specific legal conditions that mandate the creation of a non-unique qualified entity.

4. Phase Deliverables & Analytics

Assessed Auto-Review Set: Documents confirmed for automated/keyword assessment.
Manual Review Tagged Set: Specialized set for human extraction (handwriting, complex DBs).
Manual Triage & Exception Log: Defensive justification for all unprocessable data exclusions.
Finalized Review Protocol: Signed-off definitions of Primary, Secondary, and Dependent elements.
Raw Entity Forecast: Statistical summary of expected volume, complexity, and geographic distribution.

5. Exit Criteria (The Handoff Gate)

The Assessment Phase concludes only when the Review Set is finalized and the Review Protocol is finalized.

Review Set Locking: All documents tagged as ready_for_review, manual_review_required, or out_of_scope.
Statistical Sign-off: Final sampling report confirms zero False Negatives in the excluded set at the established confidence interval.
Counsel Approval: Formal sign-off on the Mining Triggers and Entity Layout.
Readiness for Mining: Training materials delivered to the review team; Canopy project status set to “Ready for Review.”

Next Phase: Review (The Mining)