The tech is fine. It’s the insistence that’s we only sell dull knives to prevent people from intentionally cutting themselves that’s creating impossible expectations.
This attack is the equivalent of resharpening the knife and then licking the blade. It’s not an attack, it’s self mutilation.
By “attack” they mean “jailbreak”. It’s also nothing like a buffer overflow.
The article is interesting though and the approach to generating these jailbreak prompts is creative. It looks a bit similar to the unspeakable tokens thing: https://www.vice.com/en/article/epzyva/ai-chatgpt-tokens-words-break-reddit
Perhaps they should stop trying to censor the AI then. Open source models already exist which allow you to create all the fake news and whatever other nonsense you want. The only reason companies like OpenAI care about this is because they don’t want to be legally liable.
You’re can’t stop this from happening, and arguably the world will be better off for it. People should be very skeptical of everything they see online.
That seems like they left debugging code enabled/accessible.
That seems like they left debugging code enabled/accessible.
No, this is actually a completely different type of problem. LLMs also aren’t code, and they aren’t manually configured/set up/written by humans. In fact, we kind of don’t really know what’s going on internally when performing inference with an LLM.
The actual software side of it is more like a video player that “plays” the LLM.