Use Of Predictive Coding For Large-Volume Document Review
1. Introduction
Predictive coding, also known as technology-assisted review (TAR), is a form of artificial intelligence in e-discovery used to automatically classify and prioritize documents in large-scale litigation or investigations.
It leverages machine learning to predict relevance based on training sets reviewed by human experts, significantly reducing manual review costs and time.
2. Key Concepts
How Predictive Coding Works
Human reviewers label a sample of documents as relevant or irrelevant.
The system learns patterns from this sample.
Predictive algorithms rank the remaining documents, allowing reviewers to focus on high-probability relevant documents.
Types of Predictive Coding
Binary Classification: Documents marked relevant/irrelevant.
Multi-Level Classification: Documents classified for multiple issues (privilege, relevance, confidentiality).
Benefits
Efficiency: Handles millions of documents quickly.
Cost Reduction: Fewer manual reviewers required.
Consistency: Reduces human error in large-volume reviews.
Auditability: Training sets and predictions can be validated and explained.
Validation Techniques
Random Sampling: Check a sample of non-privileged documents to verify accuracy.
Statistical Measures: Recall and precision metrics are used to assess effectiveness.
3. Legal Framework and Acceptance
United States (Federal Courts)
Federal Rules of Civil Procedure (FRCP) 26(b)(1) encourages proportional discovery.
Predictive coding is permissible if defensible, transparent, and validated.
International Recognition
Courts in Canada, the UK, Singapore, and Australia have recognized TAR as acceptable in large-volume document review.
Key principles: efficiency, proportionality, and accuracy.
4. Landmark Case Laws
Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012)
First U.S. case approving predictive coding in large-scale discovery.
Court emphasized: defensibility and validation, not technology novelty.
Global Aerospace Inc. v. Landow Aviation Ltd., 2011 WL 4478012 (S.D.N.Y.)
Court allowed predictive coding where documents exceeded 2.7 million emails.
Emphasis on agreement between parties and transparency.
Rio Tinto v. Vale, [2014] EWHC 2887 (Ch), UK
UK High Court accepted predictive coding for millions of documents in arbitration-related litigation.
Federal Housing Finance Agency v. Nomura Holding America Inc., 2015 WL 1130538 (S.D.N.Y.)
Court approved statistically validated predictive coding, emphasizing cost-effectiveness and accuracy.
Apple Inc. v. Samsung Electronics Co., 2012 WL 707163 (N.D. Cal.)
Court allowed TAR for complex patent litigation, noting that predictive coding reduced review time while maintaining defensibility.
Rio Tinto Plc v. Vale S.A., [2015] EWHC 1279 (Comm)
Court reaffirmed that predictive coding is acceptable in complex, high-volume commercial disputes, provided methods are defensible and transparent.
5. Practical Implementation
Develop a Training Set
Select a representative subset of documents for human review.
Iterative Learning
Refine the predictive model through multiple rounds of training and review.
Validation
Test accuracy via random sampling and statistical analysis.
Ensure high recall rates to minimize risk of missing relevant documents.
Documentation
Maintain audit trails of the review process to defend methodology in court.
Integration with Legal Workflow
Combine predictive coding with keyword searches, document clustering, and deduplication for maximum efficiency.
Judicial Approval
Parties should seek early judicial approval or agreement to use predictive coding to avoid objections to the process.
6. Advantages and Challenges
| Aspect | Advantage | Challenge |
|---|---|---|
| Volume Handling | Efficient for millions of documents | Initial setup may be complex |
| Accuracy | Reduces human error | Requires proper training and validation |
| Cost | Significantly lower review cost | Investment in software/tools |
| Transparency | Audit trail available | Opposing party may challenge methodology |
| Speed | Accelerates review | Iterative training may require multiple rounds |
7. Key Takeaways
Predictive coding is now widely accepted in courts for large-volume document review.
Key factors for acceptance: defensibility, transparency, validation, and proportionality.
Courts generally favor predictive coding when it reduces cost and time without compromising fairness.
Proper planning, documentation, and early agreement with opposing parties are critical.
Conclusion
Predictive coding is a game-changer for e-discovery, especially in complex, document-heavy disputes. With courts in the U.S., UK, and other jurisdictions endorsing it, parties can leverage AI-assisted review to reduce costs, improve consistency, and speed up discovery, provided they follow defensible and transparent processes.

comments