Legal Protection Of AI-Generated Linguistic Corpora And Automated Translation Engines.

1. COPYRIGHT PROTECTION OF LINGUISTIC CORPORA

(A) Are Linguistic Corpora Protected?

Linguistic corpora (datasets of texts, speech, translations) may be protected if:

  • They involve human creativity in selection or arrangement
  • They are not merely raw facts or mechanically generated data

Key Principle:

Copyright protects expression, not data itself

(B) Case Law

1. Feist Publications, Inc. v. Rural Telephone Service Co. (1991, U.S.)

Facts:
A telephone directory listing names and numbers was copied.

Held:
The U.S. Supreme Court ruled that:

  • Mere collection of facts lacks originality
  • “Sweat of the brow” (effort alone) is not enough

Relevance to AI Corpora:

  • Raw linguistic datasets (e.g., scraped text) are not protected
  • However, curated datasets (tagged, structured, annotated) can be protected

2. Infopaq International A/S v. Danske Dagblades Forening (2009, CJEU)

Facts:
A data processing company copied 11-word excerpts from articles.

Held:
Even small parts of text may be protected if they reflect originality.

Relevance:

  • AI training datasets containing excerpts may infringe copyright
  • Even short linguistic fragments in corpora can be protected

3. Authors Guild v. Google, Inc. (2015, U.S.)

Facts:
Google digitized millions of books for search and indexing.

Held:
Court held it was fair use

Relevance:

  • Large-scale text ingestion for AI training may be legal if:
    • Transformative
    • Non-substitutive
  • Supports legality of AI language corpora creation

4. Eastern Book Company v. D.B. Modak (2008, India)

Facts:
Copyright claimed over edited legal judgments.

Held:

  • Raw judgments = public domain
  • Edited versions with skill and judgment = protected

Relevance:

  • AI corpora with annotations, tagging, formatting can gain protection in India

2. DATABASE RIGHTS (Especially EU Context)

(A) Protection of Structured Corpora

5. British Horseracing Board Ltd v. William Hill Organization Ltd (2004, CJEU)

Facts:
Database of horse racing information was reused.

Held:
Database right protects investment in obtaining data, not creating data

Relevance:

  • AI corpora built through scraping may not qualify
  • But curated linguistic datasets with investment may qualify

6. Football Dataco Ltd v. Yahoo! UK Ltd (2012, CJEU)

Facts:
Football fixture lists claimed as database

Held:
No protection if data creation itself is the main investment

Relevance:

  • AI-generated corpora may fail database protection if:
    • Data is machine-generated rather than collected

3. COPYRIGHT IN AI-GENERATED OUTPUT (TRANSLATIONS)

(A) Are AI Translations Protected?

Key issue: Who is the author?

  • AI has no legal personality
  • Ownership may go to:
    • Developer
    • User
    • No one (public domain)

7. Naruto v. Slater (2018, U.S.)

Facts:
Monkey took a selfie; question was copyright ownership

Held:
Non-humans cannot hold copyright

Relevance:

  • AI-generated translations cannot have AI as author
  • Raises question: Are outputs unprotected?

8. Thaler v. Commissioner of Patents (2021–2023, multiple jurisdictions)

Facts:
AI system (DABUS) listed as inventor

Held:

  • Courts in US, UK, EU: AI cannot be inventor
  • Australia initially allowed but later reversed

Relevance:

  • Similar reasoning applies to translation engines
  • AI cannot be legal creator of linguistic output

4. PATENT PROTECTION FOR TRANSLATION ENGINES

(A) Can Translation Engines Be Patented?

Yes, if:

  • Novel
  • Non-obvious
  • Technically innovative

9. Alice Corp. v. CLS Bank International (2014, U.S.)

Facts:
Software patent validity questioned

Held:
Abstract ideas implemented on computers are not patentable unless inventive

Relevance:

  • Basic translation algorithms = not patentable
  • Advanced AI models with technical improvements = patentable

10. Diamond v. Diehr (1981, U.S.)

Held:
Software tied to technical process can be patented

Relevance:

  • AI translation engines improving computational efficiency or accuracy may qualify

5. TRADE SECRET PROTECTION

AI linguistic corpora and translation engines are often protected as trade secrets.

11. Waymo LLC v. Uber Technologies Inc. (2017, U.S.)

Facts:
Theft of self-driving car trade secrets

Held:
Misappropriation led to settlement

Relevance:

  • Training datasets and models can be:
    • Proprietary
    • Protected as trade secrets
  • Many companies (Google, OpenAI) rely on this instead of copyright

6. TRAINING DATA AND FAIR USE / FAIR DEALING

(A) Key Issue:

Is using copyrighted text for AI training legal?

12. Authors Guild v. HathiTrust (2014, U.S.)

Held:
Digitization for search and accessibility = fair use

Relevance:

  • Supports legality of:
    • AI training on large text corpora
    • Language model development

13. American Geophysical Union v. Texaco Inc. (1994, U.S.)

Held:
Copying for internal use may not be fair use

Relevance:

  • Commercial AI training may face stricter scrutiny

7. KEY LEGAL ISSUES SUMMARIZED

(A) Ownership of AI Corpora

  • Raw data → not protected
  • Curated datasets → protected
  • Database rights → limited applicability

(B) Ownership of AI Translations

  • No AI authorship
  • Possible ownership:
    • User (if creative input)
    • Developer
    • Public domain

(C) Infringement Risks

  • Use of copyrighted texts in training
  • Reproduction of protected fragments

(D) Protection Strategies

  1. Copyright (for curated corpora)
  2. Database rights (EU)
  3. Patents (for algorithms)
  4. Trade secrets (most common)
  5. Contracts and licensing

8. EMERGING LEGAL TRENDS

  • Movement toward human-centric authorship
  • Expansion of fair use for AI training
  • Growing reliance on data licensing frameworks
  • Increasing litigation on:
    • Training data
    • Output similarity
    • Ownership rights

CONCLUSION

Legal protection of AI-generated linguistic corpora and automated translation engines is fragmented and evolving:

  • Copyright law protects structured and creative datasets but not raw language
  • AI-generated translations lack clear ownership
  • Patent law protects technical innovation, not abstract translation ideas
  • Trade secrets remain the strongest protection mechanism
  • Courts consistently emphasize human authorship and originality

Case laws like Feist, Infopaq, Google Books, and Thaler collectively establish that while AI can process and generate language, legal ownership and protection still depend heavily on human involvement and legal structuring.

LEAVE A COMMENT