Legal Governance Of Machine-Learning-Driven Digital Humanities Research Outputs.

1. Introduction: Machine Learning in Digital Humanities

Digital humanities (DH) involves the intersection of computing technologies and humanities research—like history, literature, linguistics, and cultural studies. Increasingly, researchers are using machine learning (ML) to:

Analyze large corpora of texts (e.g., historical documents, literature)

Generate visualizations or reconstructions

Produce insights, summaries, or even AI-generated creative content (like poetry or digital art)

These ML-driven outputs raise unique legal governance issues, including:

Copyright and IP ownership – Who owns AI-generated outputs? The researcher, the AI developer, or the institution?

Data privacy and protection – Using personal data (like digitized letters or social media posts) may invoke GDPR or similar laws.

Liability and accountability – If an ML model produces biased or offensive content, who is responsible?

Transparency and explainability – Legal frameworks increasingly demand that ML outputs be explainable, especially in research affecting public understanding or policy.

2. Legal Governance Areas and Relevant Cases

A. Copyright and Ownership

Key Issue: Can AI-generated research outputs in digital humanities be copyrighted?

Case 1: Naruto v. Slater (2018, U.S.)

Facts: A macaque monkey took a selfie using a photographer’s camera. The question arose: who owns the copyright?

Court Decision: The court ruled that animals cannot hold copyright, so no human could claim it based solely on the monkey’s action.

Relevance to DH ML: Similarly, if a machine-learning model autonomously generates texts or art, courts may rule that AI itself cannot hold copyright, meaning ownership defaults to the human operator or programmer.

Case 2: Thaler v. Commissioner of Patents (Australia, 2021)

Facts: Stephen Thaler applied for a patent for an invention autonomously created by AI.

Decision: Australian Federal Court held that an AI cannot be legally recognized as an inventor. Human oversight is required.

Implication for DH: ML-generated research outputs (like reconstructed historical narratives) require human contribution to claim copyright.

B. Data Privacy and Protection

Key Issue: Using historical or personal datasets in ML research can implicate privacy laws.

Case 3: Google Spain v. AEPD (2014, CJEU)

Facts: The “right to be forgotten” case. Google was challenged over search results linking to outdated personal data.

Decision: EU courts ruled individuals can request removal of personal data under GDPR.

DH Implication: If ML models in digital humanities use personal letters, diaries, or social media archives, researchers must ensure anonymization and consent. Even historical datasets may trigger privacy obligations if data subjects can be identified.

Case 4: hiQ Labs, Inc. v. LinkedIn Corp. (2022, U.S.)

Facts: hiQ used publicly available LinkedIn data to train algorithms for employee analytics. LinkedIn tried to block it.

Decision: The court allowed hiQ to continue scraping public data but emphasized limits under anti-hacking laws.

Relevance: ML researchers using scraped digital archives must balance open access with platform terms and data protection laws.

C. Liability for ML Outputs

Key Issue: Who is responsible if an ML system generates misleading or offensive content?

Case 5: Loomis v. Wisconsin (2016, U.S.)

Facts: A defendant challenged the use of COMPAS, a risk assessment AI tool, in sentencing. He argued it was biased and opaque.

Decision: The court allowed AI evidence but stressed the need for human oversight.

DH Implication: ML-generated research outputs in digital humanities (like AI-produced historical interpretations or visualizations) must include transparency and human review to avoid liability.

D. Fair Use and AI Training

Key Issue: Can copyrighted texts be used to train ML models without infringing copyright?

Case 6: Authors Guild v. Google (2015, U.S.)

Facts: Google scanned millions of books to create searchable snippets. Authors claimed copyright infringement.

Decision: Court ruled that Google’s use was transformative and fair use, because it created a searchable index rather than reproducing the books.

Implication: ML-driven digital humanities research can often rely on fair use for text-mining purposes, provided outputs are transformative and non-commercial in nature.

E. Ethical and Academic Governance

While not purely legal, several guidelines and cases highlight governance expectations:

Montreal Declaration for Responsible AI (2018) – Emphasizes transparency, accountability, and human-centered design.

OECD AI Principles (2019) – Advocates AI governance frameworks ensuring privacy, security, and fairness.

Case Analogy: United States v. Microsoft (2016, Cloud Data Privacy) – Shows that even digital humanities researchers using cloud-based ML must comply with jurisdictional data laws.

3. Summary Table of Key Legal Points

Legal AreaKey CasesImplications for ML-DH
Copyright & OwnershipNaruto v. Slater; Thaler v. Commissioner of PatentsAI cannot own IP; human contribution required
Privacy & Data ProtectionGoogle Spain v. AEPD; hiQ Labs v. LinkedInConsent, anonymization, and platform compliance required
LiabilityLoomis v. WisconsinHuman oversight essential; avoid opaque decision-making
Fair Use / TrainingAuthors Guild v. GoogleTransformative text mining often allowed
Ethical GovernanceMontreal Declaration; OECD AI PrinciplesTransparency, fairness, accountability expected

4. Key Takeaways

Human authorship is critical for legal recognition of AI outputs.

Data privacy compliance is mandatory, even for historical datasets.

Liability must be anticipated, and human oversight is legally advisable.

Fair use doctrines support text-mining and ML training in research but depend on the jurisdiction.

Ethical frameworks guide best practices, even where law is unclear.

LEAVE A COMMENT