[News] AI researchers say they've found 'virtually unlimited' ways to bypass Bard and ChatGPT's safety rules

@pavnilschanda · 2 years ago

[News] AI researchers say they've found 'virtually unlimited' ways to bypass Bard and ChatGPT's safety rules

@kromem · 2 years ago

These kinds of attacks are trivially preventable, it just requires making requests 2-3x as expensive, and literally no one cares enough about jailbreaking to do that other than the media acting like jailbreaking is such an issue.

If you use a Nike shoe to smack yourself in the head, yes, that could be pretty surprising and upsetting compared to the intended uses. But Nike isn’t exactly going to charge their entire userbase more in order to safety-proof the product from you smashing it into your face.

The jailbreaking issue is only going to matter when you have shared persistence resulting from requests, and at that point in time, you’ll simply see a secondary ‘firewall’ LLM discriminator explicitly checking request and response for rule-breaking content or jailbreaking attempts before writing to a persistent layer.

As long as responses are only user-specific, this is going to remain a non-issue with unusually excessive news coverage as it’s headline grabbing and not as nuanced as real issues like biases or hallucinations.