• @[email protected]
    link
    fedilink
    English
    822 months ago

    Can we get a list of companies NOT doing this? I’d assume it’s going to be much shorter.

    • @[email protected]
      link
      fedilink
      English
      412 months ago

      All these AI and machine learning companies are taking content directly from websites and ignoring robot.txt files.

      If your content is able to be crawled, even without being listed on search engines, I don’t think it really matters.

      • @T156
        link
        English
        82 months ago

        It might help proof an AI company against legal issues that might be brought about by their using the content. If they’re ever sued by Automattic, then they can just point to the deal and say that they bought the data from them. There’s much less ambiguity.

        • @[email protected]
          link
          fedilink
          English
          32 months ago

          You are correct, about the legal stuff. These companies are being sued all the time.

          Doing this deal also makes processing the data a lot easier. Being handed a big ass database would be a lot easier than crawling for content.

          What I posted was about how they operate. These companies showed time and time again that they don’t really care what data they are taking or from whom. They will even take their own AI or machine learning content and put it in their own system.

  • @[email protected]
    link
    fedilink
    English
    662 months ago

    I work in marketing, and every client I work with who has a WordPress website is using AI to write a lot of their content. This is going to lead to circularly trained AI for sure.

      • @NewAgeOldPerson
        link
        English
        72 months ago

        No way for me to know. My programming doesn’t allow it.

    • (des)mosthenes
      link
      English
      22 months ago

      pretty sure this only applies to .com wordpress not self hosted

      • @[email protected]
        link
        fedilink
        English
        12 months ago

        Not sure, especially since they compare it to the Squareapace deal which I believe is for all sites built on the platform.

          • @[email protected]
            link
            fedilink
            English
            12 months ago

            My misunderstanding. But it looks like you need a .org to self-host WP, and like 99% of WP-built sites are .com as far as I’ve seen. I definitely do not know the technicals about different ways to host/build on the same platform, so I certainly defer to you there, but in any case, my bet is that any site/platform that gets scraped indiscriminately will lead to a lot of circular AI training.

            • @[email protected]
              link
              fedilink
              English
              12 months ago

              There are A LOT of self hosted Wordpress sites out there. Many of them you wouldn’t know unless told they were Wordpress (I believe both The Verge and TechCrunch use self hosted Wordpress). I myself have two self hosted Wordpress sites. Though I’ve been considering moving away from Wordpress for awhile now.

              • @[email protected]
                link
                fedilink
                English
                22 months ago

                Yeah there are def more self hosted than not. Wordpress.org is just the site for the open source project. Most hosting sites come with 1 click WordPress installs. I’ve built so many sites with it.

  • @[email protected]
    link
    fedilink
    English
    442 months ago

    I’m assuming this just relates to WordPress.com rather than the open-source WordPress.org but it’s still a bummer. I’ve worked with the open source platform for over a dozen years and have started to kinda loathe what it’s turned into but I’m not sure I’m yet at the point where I’m ready to migrate a bunch of sites to something else. This could be that push if they keep going down this road.

    God, am I getting too old for this shit? I’m a pretty technical person but this AI nonsense is just relentless. I’m not philosophically against the idea of AI as like any tool it has the potential to better the world, but every tech company and their dog are going all in on using it for commercial bullshit that seems to provide very little value to society. Even fucking Mozilla is going in that direction.

    • @[email protected]
      link
      fedilink
      English
      202 months ago

      Mozilla seems more towards local and privacy preserving AI Dev, no? Both are really lacking in the space IMHO

      Like I’m not interested in what the collective of digital knowledge looks like behind several corporate filters and giant rent seeking moat.

      • @[email protected]
        link
        fedilink
        English
        62 months ago

        True, and I get that realistically they do need to diversify away from Firefox … but it still feels bandwagoney to me given that seemingly every tech company (and Wendy’s) are piling into the AI train all at once. Like I said, though, I think I’m just getting too old for this.

        • @[email protected]
          link
          fedilink
          English
          32 months ago

          They were already making some good work in the field before but they trended away from it.

          Honestly it just seems like they struggle with follow through.

      • kingthrillgore
        link
        fedilink
        English
        02 months ago

        Mozilla’s business is sucking up to Google for that vendor money they spend to avoid litigation (and its not working).

          • kingthrillgore
            link
            fedilink
            English
            32 months ago

            Google gives Mozilla its money to appear that they aren’t trying to corner the browser space with Chrome. If they win the argument in court they aren’t monopolizing, they don’t have to give Mozilla shit anymore.

    • Traister101
      link
      fedilink
      English
      11
      edit-2
      2 months ago

      It’s the new NFTs and Crypto but it’s not blatantly a scam so the companies that skipped out on those sure as shit will be hoping onto AI

    • kingthrillgore
      link
      fedilink
      English
      52 months ago

      There’s already several WordPress plugins to block out Generative AI. I expect the community to have a less than chipper attitude about this over Automattic.

    • @CosmoNova
      link
      English
      3
      edit-2
      2 months ago

      I don‘t really know what to say to cheer you up. Industrial revolutions are as important and exciting as they are painful, even dreadful to many. I’ve seen no signs of this one being different. There will be a lot of losers before we can expect wide spread benefits for society from it. The current working class will suffer great losses and will have to fight so another can reap the benefits later.

  • @EdibleFriend
    link
    English
    402 months ago

    Bro…tumblr is full of some WEIRD FUCKIN SHIT YO

      • @EdibleFriend
        link
        English
        202 months ago

        I know because I was one of those weirdos lol

        • swayevenly
          link
          fedilink
          English
          52 months ago

          Got 'em.

          Sad they’re doing this with Tumblr though. It was fun but I just deleted my 10+ year old account.

          • @EdibleFriend
            link
            English
            32 months ago

            haha its been about that long since I even logged into mine.

    • @nickhammes
      link
      English
      152 months ago

      I, for one, am looking forward to the rise of generative AI trained on 2014 tumblr, hallucinating Superwholock jokes where they don’t belong, cosplayers dying themselves grey in a bathtub, and DashCon references where nobody expects them

      • @EdibleFriend
        link
        English
        112 months ago

        Bro this shit is gonna make AI UwU

  • donuts
    link
    fedilink
    332 months ago

    Funny how all of these social media platforms that were so happy to describe themselves as “the public town square of the internet” or whatever are now claiming that they own everything that everyone ever posted. So, which is it? Because it obviously cannot be both.

  • @[email protected]
    link
    fedilink
    English
    252 months ago

    Shit like this should be opt in by default. But no. Instead of respecting the users they count on ignorance, forgetfulness, and obfuscation for this kind of fuckery.

  • @gofsckyourself
    link
    English
    24
    edit-2
    2 months ago

    I always thought it was scummy as fuck that WordPress.org, a 501c3 nonprofit, is allowed to funnel business to WordPress.com which is a completely separate for-profit entity.

    They are even allowed to trick people into thinking they are the same by using the name and trademarks, which they explicitly state you cannot do. But wp.com gets a free pass for some reason? Scummy as fuck.

    • @TORFdot0
      link
      English
      52 months ago

      Yeah I’ve never liked Wordpress. But it’s pretty much the defacto CMS for noobs. I always have used my own self-built CMS’s on frameworks like Laravel but it’s not really practical for non-tech people or even businesses to self develop their own CMS unless they have really specific needs.

      I’m going to be honest, I didn’t even realize that Wordpress.org existed and was a non-profit; I just thought making the source available was something they did because you can’t really not do that as PHP framework.

  • Kid_Thunder
    link
    fedilink
    232 months ago

    It’s crazy that it sounds like paying customers might also have to opt-out.

  • @SuperSynthia
    link
    English
    182 months ago

    Not only am I really glad to not be on tumblr, but this further shows I shouldn’t use wordpress for my website even though there is an opensource version

    • kingthrillgore
      link
      fedilink
      English
      1
      edit-2
      2 months ago

      WordPress is either:

      • overkill for a lot of users, when static site generators do the job faster and easier
      • underkill when you have topology, data types, logic, and content pipeline challenges, for which Drupal is king but far more complex
  • @LunaCtld
    link
    English
    132 months ago

    I welcome this change actually. Now users can clearly see what others have been saying forever: If you don’t pay for the product, you ARE the product.

    • @NOT_RICK
      link
      English
      92 months ago

      And sometimes when you pay you’re still the product. Smart TVs, occulus, etc

    • @[email protected]
      link
      fedilink
      English
      62 months ago

      If you don’t pay for the product, you ARE the product.

      Well, that’s not always true. I don’t pay for Wikipedia, am I the product?

    • @MossBear
      link
      English
      22 months ago

      Explain how I’m the product relative to Linux.

      • @Crack0n7uesday
        link
        English
        2
        edit-2
        2 months ago

        With Linux you pay for support if you ever need it. Most end users will never need support, but businesses running Linux servers pay Red Hat a shit load to support them in case shit ever hits the fan. Like giving away a free car, but only certain people know how to do maintenance on it, and they all work for the manufacturer.

        • @MossBear
          link
          English
          12 months ago

          I’m not a business, so it doesn’t apply to me.

      • @RizzRustbolt
        link
        English
        12 months ago

        Have you told anyone to switch to Linux?

  • kirbowo808
    link
    fedilink
    122 months ago

    Well, time to delete my Wordpress account then. Gonna be a lot of content I gotta archive before then. ;-;

  • @phoneymouse
    link
    English
    112 months ago

    All of this is predicated on having some company that can afford to pay and wants this data. Or, the next tech bubble will just be VCs throwing money at AI companies training their models on the old internet.

  • AutoTL;DRB
    link
    fedilink
    English
    82 months ago

    This is the best summary I could come up with:


    To complicate matters even further, advertising content that isn’t even owned by Automattic, including ads from an old Apple Music campaign, has also reportedly made its way into the training data set.

    The plans at Automattic have been so controversial internally, that a product manager has even started pulling his own photos off Tumblr to make sure they’re not used to train AI, according to 404.

    Generative AI has become a big business ever since OpenAI first launched ChatGPT in late 2022 and text-prompt image creators soon followed from a number of companies.

    But major publishers have complained, with some even filing lawsuits, alleging that much of the data used to train these systems was either pirated or doesn’t constitute “fair use” under existing copyright regimes.

    In response to emailed questions on Tuesday, Automattic directed Gizmodo to a new post that more or less confirmed 404 Media’s reporting, while trying to sell the move to consumers as an opportunity to “give you more control over the content you’ve created.”

    We also plan to take that a step further and regularly update any partners about people who newly opt-out and ask that their content be removed from past sources and future training.”


    The original article contains 536 words, the summary contains 201 words. Saved 62%. I’m a bot and I’m open source!

  • @saltesc
    link
    English
    6
    edit-2
    2 months ago

    I wish I had content and data to sell :(

    Oh, wait, I do. But companies are already selling it :(

  • Nikelui
    link
    fedilink
    42 months ago

    I wonder if there is a text equivalent of Glaze and Nightshade, to perform adversarial attacks on AI scraping the text.