Maven, a new social network backed by OpenAI’s Sam Altman, found itself in a controversy today when it imported a huge amount of posts and profiles from the Fediverse, and then ran AI analysis to alter the content.

    • @[email protected]
      link
      fedilink
      English
      1456 months ago

      The wildest part is that he’s surprised that Mastodon peeps would react negatively to their posts being scrapped without consent or even notification and fed into an AI model. Like, are you for real dude? Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI? Come the hell on…

      • @danc4498
        link
        English
        326 months ago

        People can complain, but the Fediverse is built to make consuming user’s data easy. If you don’t want AI using your data, don’t put it on such an easily “scrapable” network.

        • @[email protected]
          link
          fedilink
          English
          476 months ago

          Yeah, and girls dress for rape. They are just aaasking for it!

          I will go off on a tangent.

          Just because something is online it does not mean I give a full green light on anything.

          Fuck this noise of social parasitic networks hammering free service therefore pay with data into everyone’s skull. And everyone posts crap.

          It is a billion dollar business. LLMs are extracting millions and will generate more.

          You know why? Because worthless shit you post online is not worthless after all.

          Yes, you are reading it right. Pay me. Pay us.

          Before anyone ridicules this. Yall be defending billion dollar corporations, staffed with millionaires below C-levels.

          People should start demanding money from these greedy assholes.

          • circuscritic
            link
            fedilink
            English
            23
            edit-2
            6 months ago

            I don’t think they’re making a moral argument, but pointing out the reality of the situation as it stands.

            This is a problem that can only be fixed through legislation and aggressive enforcement backed by large punitive actions.

            Until that happens, it’s better to acknowledge and understand the reality of the situation, than to believe that a morally righteous condemnation will somehow unmake that reality.

            It sucks. I agree with your philosophical stance, except for the payment for personal data, as I’d prefer a complete opt-out. However, none of that changes where we’re at right now.

          • @GlitterInfection
            link
            English
            16 months ago

            A mild copyright violation based on a system designed around the constant distribution of copies of things is NOT a parable about sexual violence, people.

            I feel like this extremely insensitive rape take is the fediverse’s version of the Godwin Law.

          • @[email protected]
            link
            fedilink
            English
            16 months ago

            ITT people not recognizing that there’s a difference between comparing and equating.

            People, it’s possible to make analogies to more serious situations without saying the two things are equal. The statement above is saying it’s there’s a shared mentality, not a shared level of consequence/seriousness.

          • Phoenixz
            link
            fedilink
            English
            -26 months ago

            You’re right but…

            It’s the same with open source products. Companies just take it, make billions off it, give nothing back, will try the embrace, enhance , extinguish tactics, will hide any GPL licensing because of course they would…

            It’ll happen anyway, and you can’t stop it. Like you said, girls dress to rape is bullshit. But if a girl goes in a skimpy bikini in a Bombay bus at 9pm, then you’re kind of asking for something. Open source is open for everyone, that is kind of the point, it’s the reason why it became so big in the first place, but it WILL be abused because there are always abusers out there

          • @[email protected]
            link
            fedilink
            English
            -4
            edit-2
            6 months ago

            Are you seriously co.parong having your shitty Internet comments scrapped by AI to someone actually raping you? Wtf?

        • @bbuez
          link
          English
          156 months ago

          Alternatively, use a closed ecosystem susceptible to data rot and loss.

          Want to contribute to our open source project? Join our discord

          Would you want art to be unfindable because scraping for AI image generation happens? It’s a solution looking for problems.

        • Scrubbles
          link
          fedilink
          English
          86 months ago

          This is what I’ve been saying the entire time. It sucks, and it’s wrong, but the fediverse is built from the ground up as an open sharing platform, where amour data is shared with anyone. It shouldn’t be, and it’s wrong, but there is nothing to stop anyone from doing it. To change that would alter federation at a core level

          • @danc4498
            link
            English
            136 months ago

            I would rather my content be open to the world for however it wants to use it than owned by a single company that gets to profit off aggregating and selling it.

            • Scrubbles
              link
              fedilink
              English
              46 months ago

              Fully agree. The annoyances of free and open are vastly outweighed by the negatives

          • @[email protected]
            link
            fedilink
            English
            26 months ago

            Yeah but doesn’t hubzilla (https://hubzilla.org/page/info/discover) applies a privacy layer to how its content it is distributed? The issue then lies also in how the social network gets implemented in function of its purpose, in hubzilla vs lemmy case for instance is a public board vs a social network

            • @[email protected]
              link
              fedilink
              English
              26 months ago

              If it ends up being ruled that training an LLM is fair use so long as the LLM doesn’t reproduce the works it is trained on verbatim, then licensing becomes irrelevant.

            • Scrubbles
              link
              fedilink
              English
              26 months ago

              I’ve had this argument with other people, but essentially at this point there is no licensing beyond server ownership here, and most servers don’t have any licenses defined. Even if they do, then sure they did something wrong… but how would you ever prove it or enforce it? The only way to actually disallow them is to switch from open federation to closed - which goes against what we’re trying to build with federation.

              • @[email protected]
                link
                fedilink
                English
                0
                edit-2
                6 months ago

                There has been instances before where LLMs gave up clues as to what source it used. When that happens, they can be sued.

                Im okay with people using our data for whatever, since it’s all open and it should be. But I rather put a little bit of effort to make for profit use technically illegal. It’s better than nothing.

        • @[email protected]
          link
          fedilink
          English
          16 months ago

          Just because our data is accessible doesn’t mean it’s legally licensed to be used by a for profit company. Free doesn’t meant you can do what you want with it, it just means no cost.

          • @danc4498
            link
            English
            26 months ago

            I don’t disagree. I’m just saying that so long as you’re putting content on this platform, you are powerless to stop any service from using the features of the platform in whatever way they want.

            It was built for easy and open consumption of user content by other services.

            • @[email protected]
              link
              fedilink
              English
              16 months ago

              Oh yeah for sure. Anything I type here is for the whole world to see and I’m okay with that as long as it’s anonymous.

        • @[email protected]
          link
          fedilink
          English
          16 months ago

          People can complain, but the Fediverse is built to make consuming user’s data easy

          Correction: it is built to make consuming users’s data not easy, but more human.

          WHat you are thinking of is AP, not “Fediverse”, and even then that’s a stretch.

          • @danc4498
            link
            English
            26 months ago

            Correction: it is built to make consuming users’s data not easy, but more human.

            What does that even mean?

            WHat you are thinking of is AP, not “Fediverse”, and even then that’s a stretch.

            Honestly, I think Fediverse is inseparable from AP (or some similar protocol). You can split hairs if you want, but the thing that makes it different from all other social media services is that it allows the content created by users on one service to be imported into a different service.

            You can hope and dream that it is only services like Lemmy consuming user content from services like Mastadon, but this same protocol makes it easy for services like ChatGPT to consume the same data.

      • FaceDeer
        link
        fedilink
        106 months ago

        It sounds like they weren’t “being fed into an AI model” as in being used as training material, they were just being evaluated by an AI model. However…

        Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI?

        Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

        It sounds like Maven wants to play nice, but if the “general attitude” means that playing nice is impossible why should they even bother to try?

        • @[email protected]
          link
          fedilink
          English
          66 months ago

          The anti-AI knee-jerk reactions can be extreme, I agree, but at the same time one of important features of Mastodon is that your feed is nor controlled by an algorithm in any way.

          So when a company comes, takes those posts and screws with them to create an algorithm to show them, I understand people getting angry - at least some of them joined to be free of that exact thing…

          • FaceDeer
            link
            fedilink
            86 months ago

            One of the important features of Mastodon is that you can choose what your feed is. Everyone’s feed has an algorithm determining what’s in it even if it’s just a simple “list the posts of everyone I’ve subscribed to in chronological order.”

            If someone else wants to see a feed of content that is curated and sorted in a different way, why get angry at them? They’re not forcing you to see that feed.

        • @[email protected]
          link
          fedilink
          English
          26 months ago

          Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

          This wasn’t always the case. A lot of research on NLP uses scraped social media posts (2010’s). People never had a problem with that (at least the outrage wasn’t visible back then). The problem now is that our content is being used to create an AI product where there is zero consent taken from the end-user.

          Source: My research colleagues used to work on NLP

          • @[email protected]
            link
            fedilink
            English
            46 months ago

            For me, more specifically, the problem is they took my data and made a tool to sell it back to me without paying me for it.

            I have no real issue with current ai stuff, other than you’re effectively taking our stuff and want us to pay you for doing so.

            If they weren’t freeloading on everyone, I suspect you’d have a lot less angry people.

            • @[email protected]
              link
              fedilink
              English
              16 months ago

              This. If Maven offered me a stipend for life to have my content used (because they’re not going to remove it in 3 or 6 months, right? once ingested it’s there forever), then I would be far more open to at least discussing their terms.

          • @[email protected]
            link
            fedilink
            English
            16 months ago

            Consent isn’t legally required if it’s fair use. Whether it’s fair use remains to be ruled on by the courts.

      • @Etterra
        link
        English
        66 months ago

        It’s not surprised. He’s acting surprised because he got caught. It’s pretty standard for these jerkass tech bros. “Move fast break things” is code “break laws be unethical” - as I think we’ve all seen if you do it often and fast enough you can keep way ahead of any kind of accountability because everybody else is trying to play catch up well the last thing has already filtered out of the news cycle.

      • @deafboy
        link
        English
        -46 months ago

        I’m surprised as well. We put our posts up for anyone to replicate and republish, yet we still get mad when somebody replicates and republishes it. It does not make sense. Activitypub is an open network with zero privacy expectations.

        • @[email protected]
          link
          fedilink
          English
          66 months ago

          And yet we don’t want our posts to be fed into AI slop, nor do we want independent hosts to pay for the massive amount of traffic generated by a massive corporate entity to trying to consume data en masse.

    • @[email protected]
      link
      fedilink
      English
      66 months ago

      Look at that shit-eating grin, he knows. There’s no way someone can be that out of touch, right? Right?!?

    • @disguy_ovahea
      link
      English
      26 months ago

      How does someone with a last name that close to secretion choose to go by Jimmy?

  • @lunarul
    link
    English
    886 months ago

    I was confused why a package manager would need to import posts from a social network.

    Why name a new product the same as a very popular existing product?

  • threelonmusketeers
    link
    fedilink
    English
    406 months ago

    I was confused on what they were trying to accomplish, and even after reading the article I am still somewhat confused.

    Instead, when a user posts something, the algorithm automatically reads the content and tags it with relevant interests so it shows up on those pages. Users can turn up the serendipity slider to branch out beyond their stated interests, and the algorithm running the platform connects users with related interests.

    Perhaps I’m a minority, but I don’t see myself getting much utility out of this. I already know what my interests are, and don’t have much interest in growing them algorithmically. If a topic is really interesting, I’ll eventually find out about it via an actual human.

    • @[email protected]
      link
      fedilink
      English
      286 months ago

      Yeah, we’re trying to get the fuck away from algorithms. That’s what makes the fediverse such a big draw currently, for me.

      • Scrubbles
        link
        fedilink
        English
        136 months ago

        Only algorithm I need is posts I subscribe to, in descending order. That’s about it

      • FaceDeer
        link
        fedilink
        36 months ago

        You’re on slrpnk.net, I assume it’s not implementing any of this stuff. As long as you don’t sign up for Maven I don’t see how this is going to affect you.

        • @[email protected]
          link
          fedilink
          English
          7
          edit-2
          6 months ago

          I mean yeah, maybe it won’t affect me directly, I like the instance I’m on and it’s a pretty respectable one. However, indirectly, this is very relevant to any Fediverse user, regardless of the instance or platform they’re using. Allowing abuses like this to happen without any pushback is a surefire way of turning this place into a shithole just like the rest of the internet. I appreciate the fact that, at least for now, it’s different here.

          Also, maybe this isn’t my only homebase? Just saying.

    • @Zak
      link
      English
      116 months ago

      TikTok is really popular operating on essentially the same principle. I, for one want nothing to do with that.

    • @Plopp
      link
      English
      16 months ago

      Instead, when a user posts something, the algorithm automatically reads the content and tags it with relevant interests so it shows up on those pages.

      Motherfucker this is what hashtags are for.

    • @[email protected]
      link
      fedilink
      English
      0
      edit-2
      6 months ago

      So you don’t ever want to learn about new things? And even if you did, you wouldn’t want those new things be efficiently suggested to you and instead be bundled with a bunch of other boring crap?

      Also, what you’re asking for is what the tool seems to do. You would put the slider all the way to one side to avoid having new stuff suggested. Existing social media platforms often just shove stuff at you endlessly.

  • @[email protected]
    link
    fedilink
    English
    29
    edit-2
    6 months ago

    That’s why I keep saying it’s pointless to defederate corpos. They’ll just scrape everything before you notice.

    • zoey
      link
      fedilink
      English
      286 months ago

      The fact they even got DMs from at least one instance is crazy.

      • @mke
        link
        English
        27
        edit-2
        6 months ago

        And it’s also damming for private messaging on mastodon.

        I once read vague complaints about it being a rushed implementation. While I won’t trust those without evidence, I for sure wouldn’t trust mastodon with my PMs. At least, not until how this was allowed to happen is figured out and fixed if necessary.

        P.S. I’m still not sure I believe in PMs in the fediverse. If I need to share something and care about keeping it private, I’d rather move the conversation elsewhere.

        • @[email protected]
          link
          fedilink
          English
          176 months ago

          I was under the impression that DM’s on Mastodon (and Lemmy too) weren’t ever stated as being secure and I think that they were both pretty transparent about this particular aspect.

          • @mke
            link
            English
            106 months ago

            You’re right, regarding Mastodon. I won’t edit my other comment, though, both to preserve the original chain of thought and because that brings up another discussion.

            To quote the EFF:

            We feel that the intended usage of the feature will not determine people’s expectation of privacy while using it.

            Offering people a feature with preexisting expectations, similar to other things that fulfill those expectations, then telling people “We know it looks like a duck but don’t expect it to quack!”

            …It begs the question: was the feature really a good idea?

          • @[email protected]
            link
            fedilink
            English
            16 months ago

            That’s right; they’ve always been documented to be DMs, not PMs.

            But because of the discordbabies people confuse both.

      • @[email protected]
        link
        fedilink
        English
        16 months ago

        Well the problem is user perception/understanding.

        The reality is they were literally direct messages, not private messages.

    • @[email protected]
      link
      fedilink
      English
      96 months ago

      Defederation is more about not being flooded with 1000x more users than the Fediverse currently has

      • @[email protected]
        link
        fedilink
        English
        16 months ago

        Unfortunately a lot of people think it’s to do with scraping as well. The amount of “defederate Threads so that they can’t scrape my data” posts I saw was about 50-50 with the sensible takes.

      • @[email protected]
        link
        fedilink
        English
        16 months ago

        So far we only have a corpo fedi-twitter in form of Threads. In that case non-corpo instance user has to specifically follow someone before their content is federated so that sounds like a bit overblown issue.

          • @[email protected]
            link
            fedilink
            English
            16 months ago

            There’s no real harm in that unless they spam, at which point those accounts can be banned which shouldn’t overwhelm moderators.

    • Pennomi
      link
      English
      36 months ago

      Plus even if you defederate them, oops, it’s all public anyway!

  • @[email protected]
    link
    fedilink
    English
    136 months ago

    Oh shit, the persona guy was right! We should all be adding license to our comments, so could not legally train model that are then used for commercial purposes.

    • Pennomi
      link
      English
      186 months ago

      The easiest way is a sitewide NoAI meta tag, since it’s the current standard. Researchers are much more likely to respect a common standard and extremely unlikely to respect a single user’s personal solution adding a link to their comments.

      • Scrubbles
        link
        fedilink
        English
        66 months ago

        This is the only way I see it being acceptable. How do we add this to instances?

      • @iAvicenna
        link
        English
        46 months ago

        I feel like the bad thing about this is, whereas the researchers will mostly respect this, companies who want to make money out of data will still secretly keep using the data anyways. I am more ok with the data being used for non-profit research and not for making money but this would likely have the opposite effect.

        • Pennomi
          link
          English
          16 months ago

          If that’s truly the case, nothing on earth can protect your data.

          That being said, large corporations are far more liable to consumer protection lawsuits, especially in areas like the EU.

          • @iAvicenna
            link
            English
            2
            edit-2
            6 months ago

            They also have enough lawyer power to find loop holes. Stuff like if your main compute cluster is in xyz state or in xyz islands then you can get away with a fine the fraction what you can make with this data.

      • @[email protected]
        link
        fedilink
        English
        16 months ago

        Why do you think it won’t hold water legally? There’s a case going right now against Github Copilot for scraping GPL licences code, even spitting it back out verbatim, and not making “open” AI actually open.

        Creative Commons is not a joke licence. It actually is used by artists, authors, and other creative types.

        Imagine Maven or another company doing the same shit they just did and it coming to light there were a bunch of noncommercially licences content in there. The authors could band together for a class action lawsuit and sue their asses. Given the reaction of users here and on mastodon, I wouldn’t even be surprised if it did happen.

        Anti Commercial-AI license

          • Venia Silente
            link
            fedilink
            English
            16 months ago

            Don’t we also need a critical mass of people adding licenses to posts? So that a class action suit can be launched. Because it would be inviable and a very rapid path to self-defeat if people started to try and individually sue big corpo.

            Also I’m missing a way to automatically add this to my posts. Something like a browser extension.

            This post is licensed under CC BY-NC-SA 4.0.

              • Venia Silente
                link
                fedilink
                English
                16 months ago

                Also for me I’m using a text expander so that after I type a shortcut it automatically adds the rest of the text for me.

                I request of you, show me your ways!

                • @[email protected]
                  link
                  fedilink
                  English
                  16 months ago

                  Well on firefox/chrome extensions you can search for text expander and choose an extension that works for you.

                  Or if you are using a phone you can do the same on the app store and I think there should be a few options.

                  Once you download one of them it should give instructions on how to use it, but in general it asks you to create a phrase that you want to be automatically triggered and a shorter phrase that automatically replaced with the longer phrase.

                  For example-

                  long phrase: The quick brown fox jumped over the moon.

                  short phrase: /qfox

                  and every time you typed /qfox it would replace it with “The quick brown fox jumped over the moon.”

                  Anti Commercial-AI license (CC BY-NC-SA 4.0)

    • @[email protected]
      link
      fedilink
      English
      16 months ago

      It’s especially for these kinds of dumb cases where they simply copy content wholesale and boast about it. With more people licencing their contents as non commercial, the “hot water” these companies get in could not just be trivial but actually legal.

      Would be great if web and mobile clients supported signatures or a “licence” field from which signatures were generated. Even better would be if people smarter than me added a feature to poison AI training data. This could also be done by a signature or some other method.

      Anti Commercial-AI license

      • @[email protected]
        link
        fedilink
        English
        16 months ago

        I don’t know; AFAIK, Reddit successfully argued that they own Wallstreetbets’ trademarks in court. That might void all of these licenses depending on the ToS of the instance being used.

  • @Larry
    link
    English
    106 months ago

    Am I misunderstanding this, or did they just fuck up the integration so it’s one way with a plan to make it two ways after, and the AI alteration is just sentiment analysis on whatever they took?

    • FaceDeer
      link
      fedilink
      136 months ago

      Looks like it.

      In addition to pulling in posts, the import process seems to be running AI sentiment analysis to add tags and relational data after content reaches Maven’s servers. This is a core part of Maven’s product: instead of follows or likes, a model trains itself on its own data in an attempt to surface unique content algorithmically.

      But of course, that news doesn’t give the reader those lovely rage endorphins or draw clicks.

      This is the Fediverse, having the content we post get spread around to other servers is the whole point of all this. Is this a face-eating leopard situation? People are genuinely surprised and upset that the stuff we post here is ending up being shown in other places?

      There is one thing I see here that raises my eyebrows:

      Even more shocking is the revelation that somehow, even private DMs from Mastodon were mirrored on their public site and searchable. How this is even possible is beyond me, as DM’s are ostensibly only between two parties, and the message itself was sent from two hackers.town users.

      But that sounds to me like a hackers.town problem, it shouldn’t be sending out private DMs to begin with.

    • Sean TilleyOP
      link
      English
      96 months ago

      They kind of fucked up everything in approaching this by not talking to the community and collecting feedback, making dumb assumptions in how the integration was supposed to work, leaking private posts, running everything through their AI system, and neglecting to represent the remote content as having came from anywhere else.

      The other thing is that Maven’s whole concept is training an AI over and over again on the platform’s posts. Ostensibly, this could mean that a lot of Fediverse content ended up in the training data.

  • @[email protected]
    link
    fedilink
    English
    56 months ago

    Genuine question, do instances not have a GPL license on their content? With that license, anyone can use all the data but only for open source software.

    • @GamingChairModel
      link
      English
      36 months ago

      Instances don’t actually own the copyright to comments. The poster owns the copyright and licenses it to the instance. Which lets the instance use it, but not sublicense to others.

    • @Spedwell
      link
      English
      26 months ago

      The current assumption made by these companies is that AI training is fair use, and is therefore legal regardless of license. There are still many ongoing court cases over this, but one case was already resolved in favor or the fair use position.

    • @[email protected]
      link
      fedilink
      English
      16 months ago

      I don’t think you can use gpl for anything but code. Creative commons license would be more appropriate.

  • Flax
    link
    fedilink
    English
    16 months ago

    Does Maven have anything to do with AI despite being backed by a dude who works for open AI?

    • Sean TilleyOP
      link
      English
      16 months ago

      Yes, the entire platform trains itself on posts within its platform to make algorithmic decisions and present it to users. Instead of likes or follows, you just have that.

      • Flax
        link
        fedilink
        English
        16 months ago

        But it doesn’t actually produce content that’s AI generated by an LLM model?

  • katy ✨
    link
    fedilink
    English
    -16 months ago

    yeah but who posts to mastodon under public instead of unlisted/quiet public?

    • Sean TilleyOP
      link
      English
      86 months ago

      Pretty much everybody.