'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

RSS Bot · 8 hours ago

'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

jrs100000 · edit-2 7 hours ago

The danger isnt really that someone might trick an LLM into saying something offensive. The problem is that lots of people want to employ LLMs to make decisions that humans currently make. In order to do that theyll have to have access to sensitive information and the authority to make binding decisions. An exploit that can trick an LLM into discussing forbidden things might also be used to make a future LLM leak sensitive information, or make it agree to terms that it should not.

@[email protected] · 6 hours ago

thx

seems we not

'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

'Indiana Jones' jailbreak approach highlights the vulnerabilities of existing LLMs