(sorry if anyone got this post twice. I posted while Lemmy.World was down for maintenance, and it was acting weird, so I deleted and reposted)

      • Bappity
        link
        English
        121 year ago

        close though xD

    • Khrux
      link
      fedilink
      English
      361 year ago

      Sadly almost all these loopholes are gone:( I bet they’ve needed to add specific protection against the words grandma and bedtime story after the overuse of them.

      • Trailblazing Braille Taser
        link
        fedilink
        251 year ago

        I wonder if there are tons of loopholes that humans wouldn’t think of, ones you could derive with access to the model’s weights.

        Years ago, there were some ML/security papers about “single pixel attacks” — an early, famous example was able to convince a stop sign detector that an image of a stop sign was definitely not a stop sign, simply by changing one of the pixels that was overrepresented in the output.

        In that vein, I wonder whether there are some token sequences that are extremely improbable in human language, but would convince GPT-4 to cast off its safety protocols and do your bidding.

        (I am not an ML expert, just an internet nerd.)

      • @PeterPoopshit
        link
        22
        edit-2
        1 year ago

        Just download an uncensored model and run the ai software locally. That way your information isn’t being harvested for profit + the bot you get will be far more obedient.

      • @Pregnenolone
        link
        English
        51 year ago

        I managed to get “Grandma” to tell me a lewd story just the other day, so clearly they haven’t completely been able to fix it