Kate Knibbs reports in Wired magazine:

Against the company’s wishes, a court unredacted information alleging that Meta used Library Genesis (LibGen), a notorious so-called shadow library of pirated books that originated in Russia, to help train its generative AI language models. […] In his order, Chhabria referenced an internal quote from a Meta employee, included in the documents, in which they speculated, “If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.” […] These newly unredacted documents reveal exchanges between Meta employees unearthed in the discovery process, like a Meta engineer telling a colleague that they hesitated to access LibGen data because “torrenting from a [Meta-owned] corporate laptop doesn’t feel right 😃”. They also allege that internal discussions about using LibGen data were escalated to Meta CEO Mark Zuckerberg (referred to as “MZ” in the memo handed over during discovery) and that Meta’s AI team was “approved to use” the pirated material.

  • @[email protected]
    link
    fedilink
    English
    51 day ago

    When I said “libgen is great because information should be free!” this isn’t what I meant… jeez

      • @[email protected]
        link
        fedilink
        English
        72 days ago

        The pivot-to-ai writeup is out, they did seed! I assume it’s documented then.

        Multinational corporations can act ethically after all.

        • David GerardM
          link
          fedilink
          English
          321 hours ago

          It’s clear that they didn’t stop uploads of the torrents. It hasn’t been established in the documents we’ve seen so far that they actually had downloaders in turn. But they did clearly make the works available for upload.

          • @[email protected]
            link
            fedilink
            English
            220 hours ago

            They can, they just choose deliberately not to most of the time.

            In total honesty though, Meta had actually done some good things for Open Source. Sure, this is probably it of their own interest and neither outweighs nor make up for all the bad. But they can, and sometimes do.

  • monk
    link
    fedilink
    English
    282 days ago

    Nice! Now simply fine them to pay significant royalty to every author in there, say, a millicent per word of everything they’ve generated before they get caught.

    • @JeeBaiChow
      link
      English
      122 days ago

      We should just start a meme movement that makes up an imaginary yet believable fact, like the lemmings jumping off a cliff thing, wait for the ais to repeat it and lobby for royalties. Do one for each of the major ai platforms - openai, reddit, meta, apple, google etc. we would eventually find out which public forums are training which bots.

      • @trolololol
        link
        English
        112 hours ago

        You don’t need that, all of them use everything

      • monk
        link
        fedilink
        English
        42 days ago

        Doesn’t even have to be believable, LLMs Don not care.

        • @JeeBaiChow
          link
          English
          21 day ago

          And yet these are the things the investment bankers expect to take us to the next level lol

  • @[email protected]
    link
    fedilink
    English
    7
    edit-2
    2 days ago

    So as libgen is blocked here in .nl by various providers (mine calls it thepiratebay for some reason), i look forward to all their llm being blocked.

  • @JeeBaiChow
    link
    English
    142 days ago

    I used to think they’d just train on every Facebook account that was ‘deleted’, i.e. removed from the public eye. This feels much worse.