'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

RSS Bot · 11 hours ago

'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

@[email protected] · 10 hours ago

ELI5 why this is a concern. Somehow the LLM is dangerous be cause an academic can hack it and manipulate it, versus rando reading all the bank robber biographies. Neither of which is nearly as dangerous as the person sitting outside the bank all day studying all activity, and even that is a silly Hollywood strategy

LLMs with crypto- that’s the heist

jrs100000 · edit-2 10 hours ago

The danger isnt really that someone might trick an LLM into saying something offensive. The problem is that lots of people want to employ LLMs to make decisions that humans currently make. In order to do that theyll have to have access to sensitive information and the authority to make binding decisions. An exploit that can trick an LLM into discussing forbidden things might also be used to make a future LLM leak sensitive information, or make it agree to terms that it should not.

@[email protected] · 9 hours ago

thx

seems we not

'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

'Indiana Jones' jailbreak approach highlights vulnerabilities of existing LLMs

'Indiana Jones' jailbreak approach highlights the vulnerabilities of existing LLMs