Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

@cantankerous_cashew · 2 months ago

@[email protected] · 2 months ago

The notorious piracy database in question is Library Genesis.

Cached article:

@CriticalMiss · 2 months ago

Earlier reports suggested they trained it on books from Bibliotik.

What changed?

@halcyoncmdr · 2 months ago

Probably just both honestly.

@[email protected] · 2 months ago

In for a penny and for a pound.

@BetaDoggo_ · 2 months ago

The llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.