• FaceDeer
    link
    fedilink
    331 year ago

    Did anyone expect them to go “oh, okay, that makes sense after all”?

  • @[email protected]
    link
    fedilink
    English
    251 year ago

    At the crux of the author’s lawsuit is the argument that OpenAI is ruthlessly mining their material to create “derivative works” that will “replace the very writings it copied.”

    The authors shoot down OpenAI’s excuse that “substantial similarity is a mandatory feature of all copyright-infringement claims,” calling it “flat wrong.”

    Goodbye Star Wars, Avatar, Tarantino’s entire filmography, every slasher film since 1974…

    • Queen HawlSera
      link
      fedilink
      English
      15
      edit-2
      1 year ago

      Is actually reminds me of a Sci-Fi I read where in the future, they use an ai to scan any new work in order to see what intellectual property is the big Corporation Zone that may have been used as an influence in order to Halt the production of any new media not tied to a pre-existing IP including 100% of independent and fan-made works.

      Which is one of the contributing factors towards the apocalypse. So 500 years later after the apocalypse has been reversed and human colonies are enjoying post scarcity, one of the biggest fads is rediscovering the 20th century, now that all the copyrights expired in people can datamine the ruins of Earth to find all the media that couldn’t be properly preserved heading into Armageddon thanks to copyright trolling.

      It’s referred to in universe as “Twencen”

      The series is called FreeRIDErs if anyone is curious, unfortunately the series may never have a conclusion, (untimely death of co creator) most of its story arcs were finished so there’s still a good chunk of meat to chew through and I highly recommend it.

    • @[email protected]
      link
      fedilink
      English
      91 year ago

      OpenAI is trying to argue that the whole work has to be similar to infringe, but that’s never been true. You can write a novel and infringe on page 302 and that’s a copyright infringement. OpenAI is trying to change the meaning of copyright otherwise, the output of their model is oozing with various infringements.

      • Echo Dot
        link
        fedilink
        0
        edit-2
        1 year ago

        I can quote work that’s already been published, that’s allowable and I don’t have to get to the author’s consent to do that. I don’t have to get consent to do that because I’m not passing the work off my own, I am quoting it with reference.

        So if I ask the AI to produce something in the style of Stephen King no copyright is violated because it’s all original work.

        If I ask the AI to quote Stephen King (and it actually does it) then it’s a quote and it’s not claiming the work is its own.

        Under the current interpretation of copyright law (and current law is broken beyond belief, but that’s a completely different issue) a copyright breach has not occurred in either scenario.

        The only arguement I can see working is that if the AI actually can quote Stephen King that will prove that it has the works of Stephen King in its data set, but that doesn’t really prove anything other than the works of Stephen King are in its data set. It doesn’t definitively prove openAI didn’t pay for the works.

        • @[email protected]
          link
          fedilink
          English
          5
          edit-2
          1 year ago

          You can quote a work under fair use, and if it’s legal depends on your intent. You have to be quoting it for such uses as “commentary, criticism, news reporting, and scholarly reports.”

          There is no cheat code here. There is no loophole that LLMs can slide on through. The output of LLMs is illegal. The training of LLMs without consent is probably illegal.

          The industry knows that its activity is illegal and it strategy is not to win but rather to make litigation expensive, complex and slow through such tactics as:

          1. Diffusion of responsibility: (note the companies compiling the list of training works, gathering those works, training on those works and prompting the generation of output are all intentionally different entities). The strategy is that each entity can claim “I was only doing X, the actual infringement is when that guy over there did Y”.
          2. Diffusion of infringement: so many works are being infringed that it becomes difficult, especially on the output side, to say who has been infringed and who has standing. What’s more, even in clear cut cases like, for instance, when I give an LLM a prompt and it regurgitates some nontrivial recognizable copyrighted work, the LLM trainer will say you caused the infringement with your prompt! (see point 1)
          3. Pretending to be academic in nature so they could wrap themselves in the thick blanket of affirmative defense that fair use doctrine affords the academy, and then after the training portion of the infringement has occurred (insisting that was fair use because it was being used in an academic context) “whoopseeing” it into a commercial product.
          4. Just being super cagey about the details of the training sets that were actually used and how they were used. This kind of stuff is discoverable but you have to get to discovery first.
          5. and finally magic brain box arguments. These is typically some variation of “all artists have influences.” It is a rhetorical argument that would be blown right past in court, but it muddies the public discussion and is useful to them in that way.

          Their purpose is not to win. It’s to slow everything down, and limit the number of people who are being infringed who have the resources to pursue them. The goal is that if they can get LLMs to “take over” quickly then they can become, you know, too big and too powerful to be shut down even after the inevitable adverse rulings. It’s classic “ask for forgiveness, not permission” silicon valley strategy.

          Sam Altman’s goal in creeping around Washington is to try to get laws changed to carve out exceptions for exactly the types of stuff he is already doing. And it is just the same thing SBF was doing when he was creeping around Washington trying to get a law that would declare his securitized ponzi tokens to be commodities.

          • Echo Dot
            link
            fedilink
            31 year ago

            There is no cheat code here.

            No one said there was one. This isn’t about looking for way to break the law and get away with it, this is about the people who want the law to work a particular way not understanding that it doesn’t actually work that way.

            The output of LLMs is illegal.

            No its not. There is no way in which the output of an AI can be illegal. All can be proven is that the various providers did not pay for the various licences but that’s not the same as saying the output is automatically a crime, if it was then we’d not even be needing the case. The law is incredibly vague in this area.

            Sam Altman’s goal in creeping around Washington is to try to get laws changed to carve out exceptions for exactly the types of stuff he is already doing.

            Yes and that’s a good thing. Think about it for 15 seconds. If it weren’t for people like him AI would be limited to the mega corporations who can afford the licensees, we don’t want that, we want a AI technology to be available to anyone, we want AI technology to be open source. None of that can happen if the law does not change.

            You seem to be under the impression there is some evil sadistic overlord here trying to force artificial intelligence on the world when it does not wanted, but nothing could be further from the truth, if anything artificial intelligence is being developed in a way that is surprisingly egalitarian considering the corporations that are investing in it, and vague unclear unhelpful broken copyright law is getting in the way of that.

        • d-RLY?
          link
          fedilink
          21 year ago

          It doesn’t definitively prove openAI didn’t pay for the works.

          But since they are a business/org and has all of those works and using them for profit. Then it kind of would be provable if openAI did or didn’t pay the correct licenses as they and/or the publisher/Stephen King (if he directly were to handle those agreements) would have a receipt/license document of some kind to show it. I don’t agree with how copyrights are done and agree that things should be public domain much sooner. But a for-profit thing like openAI shouldn’t be just allowed to have all these exceptions that avoids needing any level of permission and paying for ones that ask for it to use it. At least not while us regular people that aren’t using these sources for profits/business also aren’t allowed to just use whatever we want.

          The only way that (I at least) see such an open use of everything at the level of all the data/information being fine is in a socialist/communist system of some kind. As the main concern for generally keeping stuff like entertainment/information/art/etc at a creator level is to have money to live in modern society where basic and crucial needs (food/housing/healthcare/etc) costs money. So for the average author/writer/artist/inventor a for-profit company just being able to take their shit and much more directly impact their ability to live.

          It is a highly predatory level of capitalism and should not have exceptions. It is just setting up a different version of the shit that needs to also be stopped in the entertainment/technology industries. Where the actual creators/performers/etc are fucked by the studios/labs/corps by not being paid anywhere near the value being brought in and may not have control over it. So all of the companies and the capitalist system are why a private entity/business/org shouldn’t just be allowed to pull this shit.

    • @[email protected]
      link
      fedilink
      English
      1
      edit-2
      1 year ago

      Speaking of slasher films, does anybody know of any movies that have terrible everything except a really good plot?

  • @mrcleanup
    link
    191 year ago

    I think the place we haven’t quite gotten to yet is that copyright is probably the wrong law for this. What the AI is doing is reverse engineering the authors magic formula for creating new works, which would likely be patent law.

    In the past this hasn’t really been possible for a person to do reliably, and it isn’t really quantifiable as far as filling a patent for your process, yet the AI does it anyway, leaving us in a weird spot.

    • @Mandarbmax
      link
      191 year ago

      US patent professional here

      Ya, saying it isn’t possible to do under patent law is no understatement. Even making the patent applications possible to allow would require changes to 35 U. S. C. 112 (A, and probably also B), 35 U. S. C. 101. This all assumes that all authors would have the time and money and energy to file a patent, which even with a good attorney is analogous to is many many hours of work and filing pro se would be like writing a whole new book. After the patent is allowed the costs of continuation applications to account for changes in the process as the author learns and grows would be a hellish burden. After this comes the 20 year lifespan of a patent (assuming all maintenance fees are paid, which is quite the assumption, those are not cheap) at which point the patent protections are dead and the author needs to invent a new process to be protected. Don’t even get me started on enforcing a patent.

      Patent law is fundamentally flawed to be sure but even if every author gets infinite money and time to file patents with then the changes needed to patent law to let them do so would leave patent law utterly broken for other purposes.

      Using patent law for this is a good idea to bring up but for the above reasons I don’t think it is viable at all. It would be better and more realistic to have congress change copyright law than to change patent law I think. Sadly, I don’t think that is particularly likely either. :(

      • @[email protected]
        link
        fedilink
        6
        edit-2
        1 year ago

        Can’t they just create a brand new law, specifically to cover these use-cases we have here?

        Edit: And thank you for your detailed answer! It was educational.

        • @Mandarbmax
          link
          41 year ago

          They could and imho (I’m not an expert on this) they probably should. This would fall under unfucking copyright though, or perhaps under a new thing along side copyright and patent law (though that sounds like more work than updating copyright law). Amending it into patent law would be the toughest option. The simple answer as to why I think that is that the vibes are off.

          As a rough analogy it would be like combating public flashers by changing the rules for the department of transportation rather than the criminal justice system (ignoring how fucked the criminal justice system is).

      • @Mandarbmax
        link
        101 year ago

        I don’t know if I would say more broken, at least patents have limits on how long they can exist for, putting an upper bound on how much damage they can cause. The again, limiting the production of vaccines during a pandemic is a lot more urgent than letting people do micky mouse cartoons so the standard for what broken is has to be a lot more stringent. It is more important for patent law to not be broken than it is for copyright law so the same amount of brokenness feels worse with patents.

    • Franzia
      link
      fedilink
      21 year ago

      What the AI is doing is reverse engineering the authors magic formula for creating new works

      Great but the humans involved knowingly let it scrub pirated works.

  • AutoTL;DRB
    link
    fedilink
    English
    151 year ago

    This is the best summary I could come up with:


    ChatGPT creator OpenAI has been on the receiving end of two high profile lawsuits by authors who are absolutely livid that the AI startup used their writing to train its large language models, which they say amounts to flaunting copyright laws without any form of compensation.

    One of the lawsuits, led by comedian and memoirist Sarah Silverman, is playing out in a California federal court, where the plaintiffs recently delivered a scolding on ChatGPT’s underlying technology.

    At the crux of the author’s lawsuit is the argument that OpenAI is ruthlessly mining their material to create “derivative works” that will “replace the very writings it copied.”

    The authors shoot down OpenAI’s excuse that “substantial similarity is a mandatory feature of all copyright-infringement claims,” calling it “flat wrong.”

    It can brag that it’s a leader in a booming AI industry, but in doing so it’s also painted a bigger target on its back, making enemies of practically every creative pursuit.

    High profile literary luminaries behind that suit include George R. R. Martin, Jonathan Franzen, David Baldacci, and legal thriller maestro John Grisham.


    The original article contains 369 words, the summary contains 180 words. Saved 51%. I’m a bot and I’m open source!

  • archomrade [he/him]
    link
    fedilink
    English
    151 year ago

    Copyright is already just a band-aid for what is really an issue of resource allocation.

    If writers and artists weren’t at risk of loosing their means of living, we wouldn’t need to concern ourselves with the threat of an advanced tool supplanting them. Nevermind how the tool is created, it is clearly very valuable (otherwise it would not represent such a large threat to writers) and should be made as broadly available (and jointly-owned and controlled) as possible. By expanding copyright like this, all we’re doing is gatekeeping the creation of AI models to the largest of tech companies, and making them prohibitively expensive to train for smaller applications.

    If LLM’s are truly the start of a “fourth industrial revolution” as some have claimed, then we need to consider the possibility that our economic arrangement is ill-suited for the kind of productivity it is said AI will bring. Private ownership (over creative works, and over AI models, and over data) is getting in the way of what could be a beautiful technological advancement that benefits everyone.

    Instead, we’re left squabbling over who gets to own what and how.

    • Franzia
      link
      fedilink
      61 year ago

      fourth industrial revolution" as some have claimed

      The people claiming this are often the shareholders themselves.

      prohibitively expensive to train for smaller applications.

      There is so much work out there for free, with no copyright. The biggest cost in training is most likely the hardware, and I see no added value in having AI train on Stephen King ☠️

      Copyright is already just a band-aid for what is really an issue of resource allocation.

      God damn right but I want our government to put a band aid on capitalists just stealing whatever the fuck they want “move fast and break things”. It’s yet another test for my confidence in the state. Every issue, a litmus test for how our society deals with the problems that arise.

      • archomrade [he/him]
        link
        fedilink
        English
        31 year ago

        There is so much work out there for free, with no copyright

        There’s actually a lot less than you’d think (since copyright lasts for so long), but even less now that any online and digitized sources are being locked down and charged for by the domain owners. But even if it were abundant, it would likely not satisfy the true concern here. If there was enough data to produce an LLM of similar quality without using copyrighted data, it would still threaten the security of those writers. What is to say a user couldn’t provide a sample of Stephen King’s writing to the LLM and have it still produce derivative work without having trained it on copyrighted data? If the user had paid for that work, are they allowed to use the LLM in the same way? If they aren’t who is really at fault, the user or the owner of the LLM?

        The law can’t address the complaints of these writers because interpreting the law to that standard is simply too restrictive and sets an impossible standard. The best way to address the complaint is to simply reform copyright law (or regulate LLM’s through some other mechanism). Frankly, I do not buy that the LLM’s are a competing product to the copyrighted works.

        The biggest cost in training is most likely the hardware

        That’s right for large models like the ones owned by OpenAI and Google, but with the amount of data needed to effectively train and fine-tune these models, if that data suddenly became scarce and expensive it could easily overtake hardware cost. To say nothing for small consumer models that are run on consumer hardware.

        capitalists just stealing whatever the fuck they want “move fast and break things”

        I understand this sentiment, but keep in mind that copyright ownership is just another form of capital.

        • Franzia
          link
          fedilink
          1
          edit-2
          1 year ago

          Thanks for this reply. You’ve shown this issue has depth that I’ve ignored because I like very few of the advocates for the AI we’ve got.

          So one thing that trips me up is I thought copyright is about use. As a consumer rather than a creator this makes complete sense - you can read it, if you own it or borrowed it, and do not distribute it in any way. But there are also gentleman’s agreements built in to how we use books and digital prints.

          Unintuitively, copying is also very important. Artists copy to learn, for example. Musicians have the right to cover anyone’s music. Engineers will deconstruct. and reverse engineer another’s solution. And businesses cheat off of one another all the time. Even when it has been proven to be wrong, the incentive is high.

          So is taking the text of the book, no matter how you got it, and using it as part of a new technology okay?

          Clearly the distribution isn’t wrong. You’re not distributing the book, you’ve made a derivative.

          The ownership isn’t there, I mean the works were pirated. We’ve been taught that simply having something that was gotten through online copying is not only against the ‘rightholder’ but “piracy” and “stealing”. I have a really simplistic view of this - I just want creators paid for their work, and have autonomy (rights) over what is done with their work. This is rarely the case, we live in a world with publishers.

          So it’s that first action. Is that use of the text in another work legal?

          My basic understanding of fair use is that fair use is when you add to a work. You critique or reuse that work. Your work is about the other work, but also something new that stands on its own like an essay or a collage, rather than a collection.

          I am so confused. Text based AI is run by capitalists. And we only have it FOSS because META can afford to lose money in order to remove OpenAI from the competition. Image based AI is almost certainly wrong, it copied and plugged in all of this other work and now tons of people are suing, Getty images is leveraging their rights management to make an AI that follows the rules we are living with. My gut reaction is a lot of people deserve royalties.

          But in the other hand it sounds like AI did not work until they gave it the entire internet worth of data to train on. Training on smaller, legal sets was a failure? Or maybe it was because they took the tech approach of training the AI on every google image of dogs, or cats, etc. Without any real variation. Because they’re engineers, not artists. And not even good engineers, if their best work is just scraping other people’s work and giving it to this weird computer program.

          This is all just stealing, right? But stealing is a lot more legal than I thought, especially when it comes to digitally published works of art, or physically published art that’s popular enough to be shared online.

  • ZILtoid1991
    link
    fedilink
    141 year ago

    seethe

    Very concerning word use from you.

    The issue art faces isn’t that there’s not enough throughput, but rather there’s not enough time, both to make them and enjoy them.

    • @[email protected]
      link
      fedilink
      151 year ago

      That’s always been the case, though, imo. People had to make time for art. They had to go to galleries, see plays and listen to music. To me it’s about the fair promotion of art, and the ability for the art enjoyer to find art that they themselves enjoy rather than what some business model requires of them, and the ability for art creators to find a niche and to be able to work on their art as much as they would want to.

  • Beej Jorgensen
    link
    fedilink
    61 year ago

    “substantial similarity is a mandatory feature of all copyright-infringement claims”

    Is that not a requirement? Time for me to start suing people!

  • @NounsAndWords
    link
    51 year ago

    I take it we don’t use the phrase “good writers borrow, great writers steal” in this day and age…

    • AphoticDev
      link
      fedilink
      101 year ago

      Wait till they find out photographers spend their whole careers trying to emulate the style of previous generations. Or that Adobe has been implementing AI-driven content creation into Photoshop and Lightroom for years now, and we’ve been pretending we don’t notice because it makes our jobs easier.

    • Franzia
      link
      fedilink
      91 year ago

      Writers are rich because they’ve made artwork and sold it. I personally hold that to a higher value than CEOs.

  • @[email protected]
    link
    fedilink
    -14
    edit-2
    1 year ago

    Amazing how every new generation of technology has a generation of users of the previous technology who do whatever they can do stop its advancement. This technology takes human creativity and output to a whole new level, it will advance medicine and science in ways that are difficult to even imagine, it will provide personalized educational tutoring to every student regardless of income, and these people are worried about the technicality of what the AI is trained on and often don’t even understand enough about AI to even make an argument about it. If people like this win, whatever country’s legal system they win in will not see the benefits that AI can bring. That society is shooting themselves in the foot.

    Your favorite musician listened to music that inspired them when they made their songs. Listening to other people’s music taught them how to make music. They paid for the music (or somebody did via licensing fees or it was freely available for some other reason) when they listened to it in the first place. When they sold records, they didn’t have to pay the artist of every song they ever listened to. That would be ludicrous. An AI shouldn’t have to pay you because it read your book and millions like it to learn how to read and write.

    • Allseer
      link
      fedilink
      English
      221 year ago

      You’re humanizing the software too much. Comparing software to human behavior is just plain wrong. GPT can’t even reason properly yet. I can’t see this as anything other than a more advanced collage process.

      Open used intellectual property without consent of the owners. Major fucked.

      If ‘anybody’ does anything similar to tracing, copy&pasting or even sampling a fraction of another person’s imagery or written work, that anybody is violating copyright.

      • @NounsAndWords
        link
        7
        edit-2
        1 year ago

        If ‘anybody’ does anything similar to tracing, copy&pasting or even sampling a fraction of another person’s imagery or written work, that anybody is violating copyright.

        Ok, but tracing is literally a part of the human learning process. If you trace a work and sell it as your own that’s bad. If you trace a work to learn about the style and let that influence your future works that is what every artist already does.

        The artistic process isn’t copyrighted, only the final result. The exact same standards can apply to AI generated work as already do to anything human generated.

        • Allseer
          link
          fedilink
          English
          21 year ago

          i don’t know the specifics of the lawsuit but i imagine this would parallel piracy.

          in a way you could say that Open has pirated software directly from multiple intellectual properties. Open has distributed software which emulates skills and knowledge. remember this is a tool, not an individual.

          • @[email protected]
            link
            fedilink
            English
            10
            edit-2
            1 year ago

            It’s not exactly the same thing, but here’s an article by Kit Walsh, who’s a senior staff attorney at the EFF explains how image generators work within the law. The two aren’t exactly the same, but you can see how the same ideas would apply. The EFF is a digital rights group who most recently won a historic case: border guards now need a warrant to search your phone.

            Here are some excerpts:

            First, copyright law doesn’t prevent you from making factual observations about a work or copying the facts embodied in a work (this is called the “idea/expression distinction”). Rather, copyright forbids you from copying the work’s creative expression in a way that could substitute for the original, and from making “derivative works” when those works copy too much creative expression from the original.

            Second, even if a person makes a copy or a derivative work, the use is not infringing if it is a “fair use.” Whether a use is fair depends on a number of factors, including the purpose of the use, the nature of the original work, how much is used, and potential harm to the market for the original work.

            And:

            …When an act potentially implicates copyright but is a necessary step in enabling noninfringing uses, it frequently qualifies as a fair use itself. After all, the right to make a noninfringing use of a work is only meaningful if you are also permitted to perform the steps that lead up to that use. Thus, as both an intermediate use and an analytical use, scraping is not likely to violate copyright law.

            I’d like to hear your thoughts.

            • Allseer
              link
              fedilink
              English
              21 year ago

              thanks for the sauce. Its very enlightening.

              it does trouble me to think that the creators of stable diffusion could be financially punished. Did they at least try to compensate the artists in anyway?

              It “feels” as though it parallels consultation. These creatives are literally paid for their creations. If a software constructs a neural network to emulate intellectual property, does that count as consultation? Could/Should it apply to the software developers or individuals using the software?

              From the technical side, I don’t understand how all the red flags aren’t already there. the source material was taken, and now any individual could acquire that exact material or anything “in the spirit of” that material through a single service. Is this a new way to pirate?

              stable diffusion is a great opportunity for small businesses. especially in an increasingly anti-small business america (maybe that’s just california?) I’d hate for it become inaccessible to creators that would wield it properly.

              as long as creatives retain the ability to sue the bad actors, i’m glad. I personally don’t need Open or whomever is directly responsible for stable diffusion and its training data to be punished.

              • @[email protected]
                link
                fedilink
                English
                3
                edit-2
                1 year ago

                In the US, fair use lets you use copyrighted material without permission for criticism, research, artistic expression like literature, art, music, satire, and parody. It balances the interests of copyright holders with the public’s right to access and use information. There are rights people can maintain over their work, and there are rights they do not maintain. We are allowed to analyze people’s publically published works, and that’s always been to the benefit of artistic expression. It would be awful for everyone if IP holders could take down any criticism, reverse engineering, or indexes they don’t like. That would be the dream of every corporation, bully, troll, or wannabe autocrat.

                The consultation angle is interesting, but I’m not sure applies here. Consultation usually involves a direct and intentional exchange of information and expertise, whereas this is an original analysis of data that doesn’t emulate any specific intellectual property.

                I also don’t think this is a new way to pirate, as long as you don’t reproduce the source material. If you wanted to do that, you could just right-click and “save as”. What this does is lower the bar for entry to let people more easily exercise their rights. Like print media vs. internet publication and TV/Radio vs. online content, there will be winners and losers, but if done right, I think this will all be in service of a more decentralized and open media landscape.

      • @[email protected]
        link
        fedilink
        21 year ago

        sampling a fraction of another person’s imagery or written work.

        So citing is a copyright violation? A scientific discussion on a specific text is a copyright violation? This makes no sense. It would mean your work couldn’t build on anything else, and that’s plain stupid.

        Also to your first point about reasoning and advanced collage process: you are right and wrong. Yes an LLM doesn’t have the ability to use all the information a human has or be as precise, therefore it can’t reason the same way a human can. BUT, and that is a huge caveat, the inherit goal of AI and in its simplest form neural networks was to replicate human thinking. If you look at the brain and then at AIs, you will see how close the process is. It’s usually giving the AI an input, the AI tries to give the desired output, them the AI gets told what it should have looked like, and then it backpropagates to reinforce it’s process. This already pretty advanced and human-like (even look at how the brain is made up and then how AI models are made up, it’s basically the same concept).

        Now you would be right to say “well in it’s simplest form LLMs like GPT are just predicting which character or word comes next” and you would be partially right. But in that process it incorporates all of the “knowledge” it got from it’s training sessions and a few valuable tricks to improve. The truth is, differences between a human brain and an AI are marginal, and it mostly boils down to efficiency and training time.

        And to say that LLMs are just “an advanced collage process” is like saying “a car is just an advanced horse”. You’re not technically wrong but the description is really misleading if you look into the details.

        And for details sake, this is what the paper for Llama2 looks like; the latest big LLM from Facebook that is said to be the current standard for LLM development:

        https://arxiv.org/pdf/2307.09288.pdf

        • @SuddenlyBlowGreen
          link
          -51 year ago

          Well, there still a shit ton we don’t understand about human.

          We do, however, understand everything about machine learning.

          • Dr Cog
            link
            fedilink
            -11 year ago

            LOL

            We understand less about how LLMs generate a single output than we do about the human brain. You clearly have no experience developing models.

            • @SuddenlyBlowGreen
              link
              4
              edit-2
              1 year ago

              Well, given how we’re the ones that developed the models, they are deterministic as we know and can save and reproduce the random weights they are given during training, and we can use a debugger to step through every single step the models makes in learning and “thinking”, yes, we understand them.

              We can not however, do that for the human brain.

              • Dr Cog
                link
                fedilink
                31 year ago

                You really don’t understand how these models work and you should learn about them before you make statements about them.

                Machine learning models are, almost by definition, non-deterministic.

                • @SuddenlyBlowGreen
                  link
                  01 year ago

                  We know the input, we can set the model to save the weight in checkpoints during training and can view them any time, and we can see weights of the finished model, and we can see the code.

                  If what you said about LLMs being completely black box were true, we wouldn’t be able to reproduce models, and each model would be unique.

                  But we can control every step of the training process, and we can reproduce not just the finished model, but the model at every single step during training.

                  We created the math, we created the training sets, we created the code and we can see and modify the weights and any other property of the model.

                  What exactly do we not understand?

      • @[email protected]
        link
        fedilink
        29
        edit-2
        1 year ago

        No that’s not how it works. It stores learned information like “word x is more likely to follow word y than word a” or “people from country x are more likely to consume food a than b”. That is what is distributed when the AI model is shared. To learn that, it just reads books zillions of times and updates its table of likelihoods. Just like an artist might listen to a Lil Wayne album hundreds of times and each time they learn a little bit more about his rhyme style or how beats work or whatever. It’s more complicated than that, but that’s a layperson’s explanation of how it works. The book isn’t stored in there somewhere. The book’s contents aren’t transferred to other parties.

        • Madison_rogue
          link
          fedilink
          5
          edit-2
          1 year ago

          The learning model is artificial, vs a human that is sentient. If a human learns from a piece of work, that’s fine if they emulate styles in their own work. However, sample that work, and the original artist is due compensation. This was a huge deal in the late 80s with electronic music sampling earlier musical works, and there are several cases of copyright that back original owners’ claim of royalties due to them.

          The lawsuits allege that the models used copyrighted work to learn. If that is so, writers are due compensation for their copyrighted work.

          This isn’t litigation against the technology. It’s litigation around what a machine can freely use in its learning model. Had ChatGPT, Meta, etc., used works in the public domain this wouldn’t be an issue. Yet it looks as if they did not.

          EDIT

          And before someone mentions that the books may have been bought and then used in the model, it may not matter. The Birthday Song is a perfect example of copyright that caused several restaurant chains to use other tunes up until the copyright was overturned in 2016. Every time the AI uses the copied work in its’ output it may be subject to copyright.

          • @LemmysMum
            link
            81 year ago

            I can read a copy written work and create a work from the experience and knowledge gained. At what point is what I’m doing any different to the A.I.?

            • @[email protected]
              link
              fedilink
              41 year ago

              For one thing: when you do it, you’re the only one that can express that experience and knowledge. When the AI does it, everyone an express that experience and knowledge. It’s kind of like the difference between artisanal and industrial. There’s a big difference of scale that has a great impact on the livelihood of the creators.

              • @LemmysMum
                link
                31 year ago

                Yes, it’s wonderful. Knowledge might finally become free in the advent of AI tools and we might finally see the death of the copyright system. Oh how we can dream.

                • Phanatik
                  link
                  fedilink
                  01 year ago

                  I’m not sure what you mean by this. Information has always been free if you look hard enough. With the advent of the internet, you’re able to connect with people who possess this information and you’re likely to find it for free on YouTube or other websites.

                  Copyright exists to protect against plagiarism or theft (in an ideal world). I understand the frustration that comes with archaic laws and that updates to laws move at a glacier’s pace, however, the death of copyright harms more people than you’re expecting.

                  Piracy has existed as long as the internet has. Companies have been complaining ceaselessly about lost profits but once LLMs came along, they’re fine with piracy if it’s been masked behind a glorified search algorithm. They’re fine with cutting jobs and replacing them with an LLM that produces less quality output at significantly cheaper rates.

            • BraveSirZaphod
              link
              fedilink
              2
              edit-2
              1 year ago

              There is a practical difference in the time required and sheer scale of output in the AI context that makes a very material difference on the actual societal impact, so it’s not unreasonable to consider treating it differently.

              Set up a lemonade stand on a random street corner and you’ll probably be left alone unless you have a particularly Karen-dominated municipal government. Try to set up a thousand lemonade stands in every American city, and you’re probably going to start to attract some negative attention. The scale of an activity is a relevant factor in how society views it.

            • Phanatik
              link
              fedilink
              21 year ago

              For one thing, you can do the task completely unprompted. The LLM has to be told what to do. On that front, you have an idea in your head of the task you want to achieve and how you want to go about doing it, the output is unique because it’s determined by your perceptions. The LLM doesn’t really have perceptions, it has probabilities. It’s broken down the outputs of human creativity into numbers and is attempting to replicate them.

              • @LemmysMum
                link
                2
                edit-2
                1 year ago

                The ai does have perceptions, fed into by us as inputs. I give the ai my perceptions, the ai creates a facsimile, and I adjust the perceptions I feed into the ai until I receive an output that meets the needs of my requirements, no different from doing it myself except I didn’t need to read all the books, and learn all the lessons myself. I still tailor the end product, just not to the same micro scale that we needed to traditionally.

                • Phanatik
                  link
                  fedilink
                  11 year ago

                  You can’t feed it perceptions no more than you can feed me your perceptions. You give it text and the quality of the output is determined by how the LLM has been trained to understand that text. If by feeding it perceptions, you mean by what it’s trained on, I have to remind you that the reality GPT is trained on is the one dictated by the internet with all of its biases. The internet is not a reflection of reality, it’s how many people escape from reality and share information. It’s highly subject to survivorship bias. If the information doesn’t appear on the internet, GPT is unaware of it.

                  To give an example, if GPT gives you a bad output and you tell it that it’s a bad output, it will apologise. This seems smart but it’s not really. It doesn’t actually feel remorse, it’s giving a predetermined response based on what it’s understood by your text.

          • Heratiki
            link
            fedilink
            English
            41 year ago

            The creator of ChatGPT is sentient. Why couldn’t it be said that this is their expression of the learned works?

              • Heratiki
                link
                fedilink
                English
                21 year ago

                I’ve glanced at these a few times now and there are a lot of if ands and buts in there.

                I’m not understanding how an AI itself infringes on the copyright as it has to be directed in its creation at this point (GPT specifically). How is that any different than me using a program that will find a specific piece of text and copy it for use in my own document. In that case the document would be presented by me and thus I would be infringing not the software. AI (for the time being) are simply software and incapable of infringement. And suing a company who makes the AI simply because they used data to train its software is not infringement as the works are not copied verbatim from their original source unless specifically requested by the user. That would put the infringement on the user.

                • Phanatik
                  link
                  fedilink
                  11 year ago

                  There’s a bit more nuance to your example. The company is liable for building a tool that allows plagiarism to happen. That’s not down to how people are using it, that’s just what the tool does.

          • Kichae
            link
            fedilink
            2
            edit-2
            1 year ago

            It’s litigation around what a machine can freely use in its learning model.

            No, its not that, either. It’s litigation around what resources a person can exploit to develop a product without paying for that right.

            The machine is doing nothing wrong. It’s not feeding itself.

      • Dudewitbow
        link
        fedilink
        51 year ago

        Its less about copying the work, its more like looking at patterns that appear in a work.

        To bring a very rudimentary example, if I wanted a word and the first letter was Q, what would the second letter be.

        Of course, statistically, the next letter is u, and its not common for words starting with Q to have a different letter after that. ML/AI is like taking these small situations, but having a ridiculous amount of parameters to come up with something based on several internal models. These paramters of course generally have some context.

        Its like if you were told to read a book thoroughly, and then after was told to reproduce the same book. You probably cannot make it 1:1, but could probably get the general gist of a story. The difference between you and the machine is the machine read a lot of books, and contextually knows patterns so that it can generate something similar faster and more accurate, but not exactly the original one for one thing.

      • @[email protected]
        link
        fedilink
        21 year ago

        When you download Vicuna or Stable Diffusion XL, they’re a handful of gigabytes. But when you go download LAION-5B, it’s 240TB. So where did that data go if it’s being copy/pasted and regurgitated in its entirety?

        • @[email protected]
          link
          fedilink
          21 year ago

          Exactly! If it were just out putting exact data they wouldn’t care about making new works and just pivot as the world’s greatest source of compression.

          Though there is some work researchers have done to heavily modify these models to over fit to do exactly this.

    • @[email protected]
      link
      fedilink
      31 year ago

      I don’t think that Sarah Silverman and the others are saying that the tech shouldn’t exist. They’re saying that the input to train them needs to be negotiated as a society. And the businesses also care about the input to train them because it affects the performance of the LLMs. If we do allow licensing, watermarking, data cleanup, synthetic data, etc. in a way that is transparent, I think it’s good for the industry and it’s good for the people.

      • Dr Cog
        link
        fedilink
        21 year ago

        I don’t need to negotiate with Sarah Silverman if Im handed her book by a friend, and neither should an AI

        • @[email protected]
          link
          fedilink
          21 year ago

          But you do need to negotiate with Sarah Silverman, if you take that book, rearrange the chapters, and then try sell it for profit. Obviously that’s extremified but it’s The argument they’re making.

          • Dr Cog
            link
            fedilink
            101 year ago

            I agree. But that isn’t what AI is doing, because it doesn’t store the actual book and it isn’t possible to reproduce any part in a format that is recognizable as the original work.

          • Heratiki
            link
            fedilink
            English
            61 year ago

            That’s not what this is. To use your example it would be like taking her book and rearranging ALL of the words to make another book and selling that book. But they’re not selling the book or its contents, they’re selling how their software interprets the book for the benefit of the user. This would be like suing teachers for teaching about their book.

          • @[email protected]
            link
            fedilink
            61 year ago

            Definitely not how that output works. It will come up with something that seems like a Sarah Silverman created work but isn’t. It’s like calling Copyright on impersonations. I don’t buy it

            • Heratiki
              link
              fedilink
              English
              71 year ago

              Yes. Imagine how much trouble ANY actor would be in if they were sued for impersonating someone nearly identical but not that person. If Sarah Silverman ever interacted with a person and then imitated that person on stage for her own personal benefit without the other persons express consent it would be no different. And comedians pick up their comedy from everything around them both natural and imitation.

              • @[email protected]
                link
                fedilink
                41 year ago

                100%. I just can’t get behind any of these arguments against AI from this segment of workers. This is no different than other rallies against technological evolution due to fear of job losses. Their scarce commodity will soon disappear and that’s what they’re actually afraid of.

                • Heratiki
                  link
                  fedilink
                  English
                  11 year ago

                  It’s easy. They’re grasping at straws because their career isn’t what it used to be. It’s something new and viral so it must be an easy target to exploit for money. Personally I’d be on top of it and setting up contracts to allow AI to use my likeness for a small subset of the usual pay. I just can’t imagine not taking advantage of the ability to do absolutely nothing and still get paid for it. Instead they appear to actively be trying to tear it down. If they were wanting to set guidelines then they would be rallying congress not suing a company based on how you FEEL it should be.

        • Madison_rogue
          link
          fedilink
          01 year ago

          Except the AI owner does. It’s like sampling music for a remix or integrating that sample into a new work. Yes, you do not need to negotiate with Sarah Silverman if you are handed a book by a friend. However if you use material from that book in a work it needs to be cited. If you create an IP based off that work, Sarah Silverman deserves compensation because you used material from her work.

          No different with AI. If the AI used intellectual property from an author in its learning algorithm, than if that intellectual property is used in the AI’s output the original author is due compensation under certain circumstances.

          • Dr Cog
            link
            fedilink
            10
            edit-2
            1 year ago

            Neither citation nor compensation are necessary for fair use, which is what occurs when an original work is used for its concepts but not reproduced.

            • @SheeEttin
              link
              English
              51 year ago

              Sure, but fair use is rather narrowly defined. You must consider the purpose, nature, amount, and effect. In the case of scraping entire bodies of work as training data, the purpose is commercial, the nature is not in the public interest, the amount is the work in its entirety, and the effect is to compete with the original author. It fails to meet any criteria for fair use.

              • Dr Cog
                link
                fedilink
                11 year ago

                The work is not reproduced in its entirety. Simply using the work in its entirety is not a violation of copyright law, just as reading a book or watching a movie (even if pirated) is not a violation. The reproduction of that work is the violation, and LLMs simply do not store the works in their entirety nor are they capable of reproducing them.

                • @SheeEttin
                  link
                  English
                  21 year ago

                  It doesn’t have to be reproduced to be a copyright violation, only used. For example, publishing your Harry Potter fanfic would be infringement. You’re not reproducing the original material in any way, but you’re still heavily depending on it.

          • @[email protected]
            link
            fedilink
            41 year ago

            It is different. That knowledge from her book forms part of your processing and allows you to extract features and implement similar outputs yourself. The key difference between the AI module and dataset is that it’s codified in bits, versus whatever neural links we have in our brain. So if one theoretically creates a way to codify your neural network you might be subject to the same restrictions we’re trying to levy on ai. And that’s bullshit.

    • HubertManne
      link
      fedilink
      31 year ago

      its a bit more than that if the ai is told to make something in the style of.

        • HubertManne
          link
          fedilink
          41 year ago

          yeah again they can’t crank out a new one every 5 minutes and actually it would overwhelm the courts as its very easy for those works to be to similar. take the guy who tried to sue disney by writing a book based on finding nemo when he found out they were making a story like that. He was shady and tried to play timeline games but he did not need to make a story just like it.

    • Franzia
      link
      fedilink
      -11 year ago

      Amazing how every generation of technology has an asshole billionaire or two stealing shit to be the first in line to try and monopolize society’s progress.

  • @ShittyRedditWasBetter
    link
    -141 year ago

    All these writers are going to lose in the long run. Just learn to use the tech FFS.

    • @[email protected]
      link
      fedilink
      71 year ago

      This is not about learning to use the tech. This is about how the tech allows the studios to radically cut work for writers, or drive their wages down. This is about distributing the wealth that comes from LLMs and the new gen of AI.

      • @SheeEttin
        link
        English
        11 year ago

        And directly using existing work without compensating the author for it.

      • @ShittyRedditWasBetter
        link
        -2
        edit-2
        1 year ago

        Lost cause. The demand WILL go down. Those that embrace it and learn to refine the generated work will be successful. You ain’t putting this genie back in the bottle.