An anticapitalist tech blog. Embrace the technology that liberates us. Smash that which does not.

    • Jo Miran
      link
      fedilink
      English
      222 months ago

      You are polluting the data set. Do it a few times with different text sources and the scrubbers won’t know what part of your comment history is good. Replace, don’t delete.

      • @[email protected]
        link
        fedilink
        English
        17
        edit-2
        2 months ago

        I’m pretty sure they’ll know that the first version of each comment is almost certainly the good one. People sometimes edit a comment to add new information or fix a typo, but they almost never replace nonsense with a good comment, rather than the other way around.

        Edit: fixed typos, also replaced excerpt from Moby Dick with this post.

        Edit 2: the comments you post here are totally available for machine learning, so I don’t see much of a point in deleting my Reddit comments as long as I’m participating in Lemmy.

        • Jo Miran
          link
          fedilink
          English
          3
          edit-2
          2 months ago

          Maybe. Almost every comment I make I edit. The key is that by doing this you are inserting the possibility. It is actually easier, and safer, to just filter out edited comments than it is to try to sort out what’s good and what isn’t. The bottom line is that the best course of action is to avoid Reddit at all cost. If you do go there and feel compelled to comment, then coming back the next day to replace your comments a few times is better than “deleting”.

          • @Blue_Morpho
            link
            English
            102 months ago

            They don’t need to filter out edited comments. They keep the first version. It’s good enough.

          • @brygphilomena
            link
            English
            52 months ago

            You could easily compare old vs new and see how much has changed. If more is added, edit is good. If 80% matches, it was probably minor fixes.

            If nothing matches, then remove it from the data set and use the original comment. Which I’m sure they still have.

            • @Blue_Morpho
              link
              English
              12 months ago

              They know people are messing with the data so they aren’t going to trust anything changed a few days after first posted.

      • @[email protected]
        link
        fedilink
        English
        62 months ago

        Not in a meaningful way. It’s easy to detect and revert a change like this. Instead of bulk changing all your comments, you should slowly change them over time.

        Even then, users don’t usually edit most of their comments. Sure Reddit might be naive and just take the current comments, but it’s pretty trivial to reverse this kind of thing.

        Probably good to do it to make this process harder and more error prone for Reddit but I would not be under the impression that this has an impact beyond being annoying.

      • @[email protected]
        link
        fedilink
        English
        22 months ago

        Or it’ll help train the AI to recognize when that happens and more easily parse history for the relevant stuff.

        • @[email protected]
          link
          fedilink
          English
          32 months ago

          It’s already happened last year during the reddit exodus. The AI models either validate the data or not. This has a chance of working, which is better than doing nothing at all.

      • @[email protected]
        link
        fedilink
        English
        12 months ago

        Over a long period sure. If they see a spike where say, 25% of a user’s comments are changed in a day, then they’ll just use day -1

  • @Grimy
    link
    English
    19
    edit-2
    2 months ago

    Reddit has a copy of every comment and edit, they probably have copies of things users type but don’t actually end up posting.

    It is brutally trivial to notice mass edits like this.

    The only thing this is doing is making it harder for people scraping it without paying, making what reddit is selling actually valuable.

    Every edited or deleted comment is more money in their pocket.

  • @[email protected]
    link
    fedilink
    English
    14
    edit-2
    2 months ago

    Let this be a lesson on generating content for a business and not getting paid for it.

    With that said, I’m sure the frog posts are exactly the kind of quality content needed to train an AI.

      • @[email protected]
        link
        fedilink
        English
        82 months ago

        Good point. At least it’s available freely to everyone instead of being used to make a profit on the content itself.

        • Pennomi
          link
          English
          32 months ago

          See I don’t have any issue with data being free. I have issue with corporations hoarding it.

  • @hypnicjerk
    link
    English
    112 months ago

    are there copyrighted texts that have such distinctive patterns that they would be particularly easy to spot in an LLM’s output? say, would replacing every comment with a page from moby dick or wuthering heights be more or less infringing than using harry potter? hypothetically.

    • @dual_sport_dork
      link
      English
      162 months ago

      Well, I’m pretty sure Moby Dick is in the public domain by now. If I were you I’d go for something from Disney which is mathematically certain to get somebody sued although I can’t predict who.

  • GregorTacTac
    link
    fedilink
    English
    112 months ago

    My comments on that site are so dumb, ai will not produce any good text after using those as training data.

  • @[email protected]
    link
    fedilink
    English
    102 months ago

    i personally think the value of the comments are worth leaving for people to find later even if Reddit does use them in an underhanded way.
    i recognize this may not be popular.

    • @w2tpmf
      link
      English
      5
      edit-2
      2 months ago

      Yup. Reddit gives zero fucks about any form or protest or the degredation of the quality of content. They already have the metric the traffic originally created.

      The only people negatively impacted are the people trying to find information that are pushed there by search engines when trying to find stuff.

    • @bitchkat
      link
      English
      52 months ago

      You are entitled to your opinion.

    • @Blue_Morpho
      link
      English
      2
      edit-2
      2 months ago

      As someone else brilliantly pointed out, leaving the comments hurts Reddit more than delete/edit.

      Deleting/editing comments only hides the posts from the public. Reddit has the original posts, is ignoring all edits made to posts, and selling that original data.

  • @Ultragigagigantic
    link
    English
    9
    edit-2
    2 months ago

    Privately owned social media platforms are a dead end.

    Libraries should host the peoples internet. Municipal mastodon has a ring to it I think.

  • @[email protected]
    link
    fedilink
    English
    62 months ago

    Good luck. Reddit owns your comments now. I deleted all my comments. Got locked out of my own account for doing so and then they reinstated every single comment. As of last year reddit has complete control of my account.

    • Jo Miran
      link
      fedilink
      English
      52 months ago

      You did not understand the concept. You cannot “delete” your comments on Reddit. You can mark them as deleted but the comments still exist. If you replace your comments with out of context text like excerpts from War and Peace, then the data scrubbers will pick the data up as good. You are polluting the data set. Do it a few times with different text sources and the scrubbers won’t know what part of your comment history is good.

      Do this and it stops being about privacy and it starts being about actively damaging the data sets.

      • @Blue_Morpho
        link
        English
        7
        edit-2
        2 months ago

        They already know that everyone editing old posts is doing it to screw with them.

        As others pointed out they will ignore all edits. Especially edits made after 1 day.

        Losing the rare grammatical fixes or additional info that was changed weeks later doesn’t matter to them. Keeping the version last changed within 1 day of first edit gives them everything they need.

        The only way to mess with reddit would require posting wrong information with a community of like minded redittors that would up vote the bad information.

  • @[email protected]
    link
    fedilink
    English
    52 months ago

    I left my comments there, thinking some day I might go back. But it’s been months and I haven’t missed it.

  • @NounsAndWords
    link
    English
    32 months ago

    I think giving their AI my depression is punishment enough.

  • @bitchkat
    link
    English
    12 months ago

    I got banned from /r/AFL because I used Redact to scrub my comments. My how the turntables have turned and they turned out to be the real thin skinned pansy cunts. Not /r/sports during the kerfuffle.