With the latest announcement regarding google allegedly paying reddit 60million per year for access to user created content to train their AI, what is stopping companies from using the freely available information on the lemmyverse to do it for free?

How does everyone feel about the likelihood of this already happening and should something be done about it?

  • @abhibeckert
    link
    English
    1
    edit-2
    10 months ago

    Also not a lawyer but maybe more familiar with IP law than you are?

    When an AI scrapes the post you just wrote… how exactly were you, the author of the post, harmed by that action? You weren’t harmed which is a powerful fair use defence. It’s not enough on it’s own, but it’s a huge step in that direction and other factors such as transforming the original add to that making a compelling case.

    Consider the most recent fair use case, which was Google had negotiations to pay license fees for Java, then refused to pay — instead Google created a copy of Java. It dragged on in court a long time and bounced back and forth on apeal, but in the end the ruling came down to “java is protected by copyright, but Sun was not sufficiently harmed, therefore it was fair use”. Or at least that’s where it was headed when Oracle (who bought Sun years after the infringement happened) decided to stop burning mountains of cash fighting a lawsuit that wasn’t likely to end well for them.

    I was somewhat surprised by that case - I felt the fact that Google had talks about paying, then decided not to pay, was pretty clear harm. But the judge didn’t see that as real harm - Java’s source code is not ‘free as in freedom’ but it is ‘free as in dollars’ to download and therefore not really properly protected by copyright. The fact the license added restrictions to what you can do with the copy you were given for free didn’t hold up in court (which has pretty widespread ramifications for GPL… I wonder who will be brave enough to test that in court… the FSF isn’t going to back down from a lawsuit like Oracle did).

    Anyway, if Java is borderline, I think the fediverse is clear cut. Almost any copy of the fediverse would be fair use. Yes, it’s technically copyrighted content, but there’s a loophole so big it surrounds the entire universe.