It has finally happened…not surprised though.

    • @[email protected]
      link
      fedilink
      76 months ago

      I don’t use ChatGPT and host my own models ✅️ I don’t use Reddit, I use Lemmy ✅️

      I agree with the dystopian part. It’s not a warm and fuzzy feeling, this feeling. A feeling that’s too common for my liking these days.

  • @[email protected]
    link
    fedilink
    English
    256 months ago

    To be true everything we post online can be used for training. Reddit is just made for money :P Kinda using more Lemmy now for posting and reddit just for browsing like archive.

  • @swooosh
    link
    166 months ago

    It’s crazy that reddit doesn’t have to ask everyone if they want to contribute. This shows who owns and controls your posts.

    • @Deckweiss
      link
      11
      edit-2
      6 months ago

      The actual crazy thing is:

      Imagine if somebody ran a Lemmy instance and just subscribed to every sublemmy and scraped all the data without asking. And nobody would even notice.

      Reddit owns the content posted on their platform. But when you post on lemmy, everybody owns it, including every data company large and small.

      But hey, at least we are feeling good about our social media platform choise, cause it’s federated and open source or whatever, right?

      • Alphane Moon
        link
        fedilink
        196 months ago

        I would say a good base assumption is that all content on the public internet is scrapped and used for AI schemes.

        It’s the other factors that matter.

      • @swooosh
        link
        5
        edit-2
        6 months ago

        Like facebooks threads?

        Everyone can use it. With reddit’s posts, only reddit can do it.

      • @[email protected]
        link
        fedilink
        5
        edit-2
        6 months ago

        i’m fine with everybody owning lemmy because thats the point of it, users make it so users can use its data, as opposed to one asshole owning and ruining reddit’s user-made content for his own pursuit of more money, for himself only.

  • @[email protected]
    link
    fedilink
    6
    edit-2
    6 months ago

    This is not so bad. Reddit is crawling with bot spam and that will increase as users leave the platform every time it does a stunt to pump the stock price. The percentage of real/fake content will decrease and will poison the training pipelines. It’s a great experiment to test model collapse in real time, really.

  • Tomkoid
    link
    fedilink
    36 months ago

    I like how they monetized their API and data because they don’t want it to be used to train AI models, and now they are selling user data for millions to OpenAI.

    • @[email protected]
      link
      fedilink
      English
      26 months ago

      Its not any different than how it already was. Initially the GenAI models were all being trained on masses of unlicensed data including data from reddit. The problem is some companies like New York Times are suing for training an LLM off of their data. So in response companies like OpenAI are now trying to reach partnerships that basically license the use of the data (that they already had). This also means that they will be able to continue to have future access to that data as long as the partnership is in place. Whereas some companies without a partnership could start to ban scraping activity or update their terms to forbid training AI off of their data.

      Overall these partnerships are a good thing. Licensed training data is good. But from a privacy standpoint, the AI models were already trained on reddit data. This is just formalizing the relationship