Use Of Synthetic Datasets As Evidence In Uk-Seated Arbitration
1. Conceptual Overview: Synthetic Datasets in Arbitration
Synthetic datasets are artificially generated data designed to replicate the statistical, structural, and behavioural properties of real-world datasets without reproducing actual underlying personal or proprietary data. In UK-seated arbitration, they commonly arise in disputes involving:
AI and algorithmic performance
Financial modelling and damages quantification
Data-sharing and data-protection-restricted environments
Cybersecurity and stress-testing claims
ESG, climate modelling, and infrastructure forecasting
Under English law, the admissibility and probative value of such datasets are assessed functionally, not formally. The key question is not whether the data is “real,” but whether it is reliable, relevant, transparent, and procedurally fair.
2. Admissibility Framework Under English Arbitration Law
2.1 Arbitration Act 1996
Synthetic datasets fall within the tribunal’s broad evidentiary discretion under:
Section 34(1) – tribunal decides all procedural and evidential matters
Section 34(2)(f) – power to determine admissibility, relevance, and weight of evidence
English law imposes no exclusionary rule against synthetic or simulated data. Instead, the tribunal evaluates:
Method of generation
Assumptions embedded in the dataset
Scope of representativeness
Opportunity for challenge by the opposing party
3. Synthetic Data as Expert Evidence
In practice, synthetic datasets almost always enter proceedings as part of expert evidence, rather than standalone factual evidence.
Case Law 1: The Ikarian Reefer [1993]
Principle established
This case sets the foundational duties of experts under English law.
Relevance to synthetic datasets
When experts rely on synthetic data:
They must explain the generation methodology
Disclose all assumptions and limitations
Avoid presenting synthetic outputs as empirical fact
Failure to do so risks exclusion or severe reduction in evidentiary weight.
Case Law 2: Jones v Kaney [2011]
Principle established
Experts owe duties of competence and honesty and may be liable for negligent opinions.
Relevance to synthetic datasets
Experts who construct or rely upon synthetic datasets:
Must ensure methodological robustness
Cannot hide behind “black-box” generation techniques
Face heightened scrutiny where the dataset substitutes unavailable real data
This encourages transparency in synthetic modelling.
4. Reliability, Models, and Simulations
Synthetic datasets are treated analogously to models and simulations, long recognised in English disputes.
Case Law 3: Imperial Chemical Industries Ltd v Merit Merrell Technology Ltd [2018]
Principle established
Courts (and by analogy tribunals) are sceptical of expert models that lack empirical grounding or validation.
Relevance to synthetic datasets
Tribunals assess whether:
The synthetic dataset is validated against real-world benchmarks
Sensitivity testing has been conducted
Alternative datasets or assumptions were considered
A synthetic dataset that merely reflects a party’s litigation position will carry little weight.
Case Law 4: British Airways plc v Spencer [2015]
Principle established
Statistical and reconstructed evidence is admissible, but its weight depends on methodological transparency.
Relevance to synthetic datasets
This case supports admitting synthetic data where:
Original data is incomplete, confidential, or lawfully inaccessible
The synthetic dataset is demonstrably representative
Margin of error is openly disclosed
Tribunals often rely on this reasoning when real datasets are barred by GDPR or trade-secret constraints.
5. Procedural Fairness and Due Process
Synthetic datasets raise acute due-process concerns, particularly where generation methods are proprietary.
Case Law 5: Pakistan v Broadsheet LLC [2019]
Principle established
Procedural fairness requires equal opportunity to test and challenge evidence.
Relevance to synthetic datasets
In arbitration, this translates to:
Disclosure of dataset architecture and assumptions
Access to underlying generation logic (even if not source code)
Ability to run counter-simulations or critique parameters
If a synthetic dataset cannot be meaningfully tested, reliance on it may breach due-process norms.
6. Disclosure and Transparency Obligations
Case Law 6: Berezovsky v Abramovich [2012]
Principle established
Courts are cautious where evidence relies on opaque or unverifiable sources.
Relevance to synthetic datasets
Tribunals may:
Discount datasets where generation inputs are undisclosed
Infer adverse credibility where transparency is resisted
Require neutral expert review of the synthetic methodology
This reinforces that synthetic does not mean unchallengeable.
7. Weight vs Admissibility: Tribunal Practice
In UK-seated arbitration, synthetic datasets are rarely excluded outright. Instead:
They are admitted under Section 34
Subjected to rigorous cross-examination
Given variable weight depending on robustness
Tribunals distinguish between:
| Use of Synthetic Data | Likely Treatment |
|---|---|
| Gap-filling where real data unavailable | Generally acceptable |
| Stress-testing scenarios | High acceptance |
| Sole proof of historical fact | Treated cautiously |
| Damages quantification | Accepted if validated |
8. Interaction with Data Protection and Confidentiality
Synthetic datasets are increasingly relied upon to avoid unlawful disclosure of personal or confidential data.
English law implicitly supports this approach where:
Real data disclosure would breach statutory duties
Synthetic alternatives preserve analytical integrity
However, tribunals ensure that data protection is not used as a shield against scrutiny.
9. Standards Emerging in UK-Seated Arbitration
From case law and arbitral practice, the following standards are emerging:
Explainability – generation methods must be intelligible
Replicability – opposing experts must be able to test outcomes
Validation – synthetic data must be benchmarked
Proportionality – level of disclosure balanced against confidentiality
Neutrality – avoidance of litigation-driven data engineering
10. Conclusion
Under English law and the Arbitration Act 1996, synthetic datasets are admissible, but never presumptively reliable. UK-seated tribunals treat them as expert-dependent, model-based evidence, assessed through the lenses of:
Expert duties (The Ikarian Reefer)
Competence and accountability (Jones v Kaney)
Model reliability (ICI v Merit Merrell)
Statistical reconstruction (British Airways v Spencer)
Procedural fairness (Pakistan v Broadsheet)
Transparency (Berezovsky v Abramovich)
The trajectory of UK arbitration practice suggests increasing acceptance, coupled with intensifying scrutiny, ensuring that synthetic datasets enhance — rather than undermine — evidentiary integrity.

comments