• Eager Eagle
    link
    English
    269 months ago

    Good move, but anyone using public data already applies a simple spam filter to reject “dumb” data poisoning. Also, hatred and other negative comments as responses will be penalized in a language model training, so an effective data poisoning takes effort. I’ll just throw some ideas here how poisoning could hypothetically have a tangible negative impact in their results.

    The best one can do in terms of data poisoning is make comments that are not easily discernible from usual comments - both for humans and machines - but are either unhelpful or misleading. This is an “in-distribution” data poisoning attack. To be really effective in having any impact whatsoever for training, they need to be mass applied using different user accounts that also upvote each others’ comments in a way that mimics real user interaction: if applied in a simplistic way, a simple graph analysis on these interactions can highlight these fake accounts as a christmas tree.

    • @[email protected]
      link
      fedilink
      English
      23
      edit-2
      9 months ago

      but are either unhelpful or misleading

      Honestly that just sounds like a lot of Reddit users in general

      • Darth_Mew
        link
        79 months ago

        yea we know that’s why he said that because that’s “real” reddit content

    • @Adalast
      link
      39 months ago

      I was contemplating the merits of botting with the current model with slight vectorization offsets so the data becomes prone to overfitting.

      I would think it would alao work to post using valid, but non-standard syntax so it muddies the n-gram searches.