• GamingChairModel
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    LLM companies have argued they should get to ignore all copyright, and now that one of their code leaked, suddenly they care greatly about copyright.

    Anthropic itself has argued that digitizing and using the digitized copies to train models is fair use, so long as:

    • They don’t redistribute the physical copies they bought
    • They don’t allow an end user to retrieve the contents of any one specific work at the user interface (if you ask Claude to spit out the entire text of a copyrighted work used to train it, it is designed to resist copying too much out of a single work)

    So they don’t argue that copyright doesn’t count, exactly. They argue that copyright doesn’t prevent model training from ingesting an entire copyrighted work, as long as it’s done with so many other copyrighted works that any given original isn’t a huge contributor to the model or its outputs.

    There’s tension in their positions, but not so much that it would totally fall apart.