Legal Protection Of AI-Generated Linguistic Corpora And Automated Translation Engines.

17 Mar 2026 --
0 Comments

1. COPYRIGHT PROTECTION OF LINGUISTIC CORPORA

(A) Are Linguistic Corpora Protected?

Linguistic corpora (datasets of texts, speech, translations) may be protected if:

They involve human creativity in selection or arrangement
They are not merely raw facts or mechanically generated data

Key Principle:

(B) Case Law

1. Feist Publications, Inc. v. Rural Telephone Service Co. (1991, U.S.)

Facts:
A telephone directory listing names and numbers was copied.

Held:
The U.S. Supreme Court ruled that:

Mere collection of facts lacks originality
“Sweat of the brow” (effort alone) is not enough

Relevance to AI Corpora:

Raw linguistic datasets (e.g., scraped text) are not protected
However, curated datasets (tagged, structured, annotated) can be protected

2. Infopaq International A/S v. Danske Dagblades Forening (2009, CJEU)

Facts:
A data processing company copied 11-word excerpts from articles.

Held:
Even small parts of text may be protected if they reflect originality.

Relevance:

AI training datasets containing excerpts may infringe copyright
Even short linguistic fragments in corpora can be protected

3. Authors Guild v. Google, Inc. (2015, U.S.)

Facts:
Google digitized millions of books for search and indexing.

Held:
Court held it was fair use

Relevance:

Large-scale text ingestion for AI training may be legal if:
- Transformative
- Non-substitutive
Supports legality of AI language corpora creation

4. Eastern Book Company v. D.B. Modak (2008, India)

Facts:
Copyright claimed over edited legal judgments.

Held:

Raw judgments = public domain
Edited versions with skill and judgment = protected

Relevance:

AI corpora with annotations, tagging, formatting can gain protection in India

2. DATABASE RIGHTS (Especially EU Context)

(A) Protection of Structured Corpora

5. British Horseracing Board Ltd v. William Hill Organization Ltd (2004, CJEU)

Facts:
Database of horse racing information was reused.

Held:
Database right protects investment in obtaining data, not creating data

Relevance:

AI corpora built through scraping may not qualify
But curated linguistic datasets with investment may qualify

6. Football Dataco Ltd v. Yahoo! UK Ltd (2012, CJEU)

Facts:
Football fixture lists claimed as database

Held:
No protection if data creation itself is the main investment

Relevance:

AI-generated corpora may fail database protection if:
- Data is machine-generated rather than collected

3. COPYRIGHT IN AI-GENERATED OUTPUT (TRANSLATIONS)

(A) Are AI Translations Protected?

Key issue: Who is the author?

AI has no legal personality
Ownership may go to:
- Developer
- User
- No one (public domain)

7. Naruto v. Slater (2018, U.S.)

Facts:
Monkey took a selfie; question was copyright ownership

Held:
Non-humans cannot hold copyright

Relevance:

AI-generated translations cannot have AI as author
Raises question: Are outputs unprotected?

8. Thaler v. Commissioner of Patents (2021–2023, multiple jurisdictions)

Facts:
AI system (DABUS) listed as inventor

Held:

Courts in US, UK, EU: AI cannot be inventor
Australia initially allowed but later reversed

Relevance:

Similar reasoning applies to translation engines
AI cannot be legal creator of linguistic output

4. PATENT PROTECTION FOR TRANSLATION ENGINES

(A) Can Translation Engines Be Patented?

Yes, if:

Novel
Non-obvious
Technically innovative

9. Alice Corp. v. CLS Bank International (2014, U.S.)

Facts:
Software patent validity questioned

Held:
Abstract ideas implemented on computers are not patentable unless inventive

Relevance:

Basic translation algorithms = not patentable
Advanced AI models with technical improvements = patentable

10. Diamond v. Diehr (1981, U.S.)

Held:
Software tied to technical process can be patented

Relevance:

AI translation engines improving computational efficiency or accuracy may qualify

5. TRADE SECRET PROTECTION

AI linguistic corpora and translation engines are often protected as trade secrets.

11. Waymo LLC v. Uber Technologies Inc. (2017, U.S.)

Facts:
Theft of self-driving car trade secrets

Held:
Misappropriation led to settlement

Relevance:

Training datasets and models can be:
- Proprietary
- Protected as trade secrets
Many companies (Google, OpenAI) rely on this instead of copyright

6. TRAINING DATA AND FAIR USE / FAIR DEALING

(A) Key Issue:

Is using copyrighted text for AI training legal?

12. Authors Guild v. HathiTrust (2014, U.S.)

Held:
Digitization for search and accessibility = fair use

Relevance:

Supports legality of:
- AI training on large text corpora
- Language model development

13. American Geophysical Union v. Texaco Inc. (1994, U.S.)

Held:
Copying for internal use may not be fair use

Relevance:

Commercial AI training may face stricter scrutiny

7. KEY LEGAL ISSUES SUMMARIZED

(A) Ownership of AI Corpora

Raw data → not protected
Curated datasets → protected
Database rights → limited applicability

(B) Ownership of AI Translations

No AI authorship
Possible ownership:
- User (if creative input)
- Developer
- Public domain

(C) Infringement Risks

Use of copyrighted texts in training
Reproduction of protected fragments

(D) Protection Strategies

Copyright (for curated corpora)
Database rights (EU)
Patents (for algorithms)
Trade secrets (most common)
Contracts and licensing

8. EMERGING LEGAL TRENDS

Movement toward human-centric authorship
Expansion of fair use for AI training
Growing reliance on data licensing frameworks
Increasing litigation on:
- Training data
- Output similarity
- Ownership rights

CONCLUSION

Legal protection of AI-generated linguistic corpora and automated translation engines is fragmented and evolving:

Copyright law protects structured and creative datasets but not raw language
AI-generated translations lack clear ownership
Patent law protects technical innovation, not abstract translation ideas
Trade secrets remain the strongest protection mechanism
Courts consistently emphasize human authorship and originality

Case laws like Feist, Infopaq, Google Books, and Thaler collectively establish that while AI can process and generate language, legal ownership and protection still depend heavily on human involvement and legal structuring.

Legal Protection Of AI-Generated Linguistic Corpora And Automated Translation Engines.

1. COPYRIGHT PROTECTION OF LINGUISTIC CORPORA

(A) Are Linguistic Corpora Protected?

Key Principle:

(B) Case Law

1. Feist Publications, Inc. v. Rural Telephone Service Co. (1991, U.S.)

2. Infopaq International A/S v. Danske Dagblades Forening (2009, CJEU)

3. Authors Guild v. Google, Inc. (2015, U.S.)

4. Eastern Book Company v. D.B. Modak (2008, India)

2. DATABASE RIGHTS (Especially EU Context)

(A) Protection of Structured Corpora

5. British Horseracing Board Ltd v. William Hill Organization Ltd (2004, CJEU)

6. Football Dataco Ltd v. Yahoo! UK Ltd (2012, CJEU)

3. COPYRIGHT IN AI-GENERATED OUTPUT (TRANSLATIONS)

(A) Are AI Translations Protected?

7. Naruto v. Slater (2018, U.S.)

8. Thaler v. Commissioner of Patents (2021–2023, multiple jurisdictions)

4. PATENT PROTECTION FOR TRANSLATION ENGINES

(A) Can Translation Engines Be Patented?

9. Alice Corp. v. CLS Bank International (2014, U.S.)

10. Diamond v. Diehr (1981, U.S.)

5. TRADE SECRET PROTECTION

11. Waymo LLC v. Uber Technologies Inc. (2017, U.S.)

6. TRAINING DATA AND FAIR USE / FAIR DEALING

(A) Key Issue:

12. Authors Guild v. HathiTrust (2014, U.S.)

13. American Geophysical Union v. Texaco Inc. (1994, U.S.)

7. KEY LEGAL ISSUES SUMMARIZED

(A) Ownership of AI Corpora

(B) Ownership of AI Translations

(C) Infringement Risks

(D) Protection Strategies

8. EMERGING LEGAL TRENDS

CONCLUSION

RELATED Blog

LEAVE A COMMENT

comments

Top Categories