Large language models (LLMs), such as the model underpinning the functioning of the conversational agent ChatGPT, are becoming increasingly widespread worldwide. As many people are now turning to LLM-based platforms to source information and write context-specific texts, understanding their limitations and vulnerabilities is becoming increasingly vital.
ELI5 why this is a concern. Somehow the LLM is dangerous be cause an academic can hack it and manipulate it, versus rando reading all the bank robber biographies. Neither of which is nearly as dangerous as the person sitting outside the bank all day studying all activity, and even that is a silly Hollywood strategy
The danger isnt really that someone might trick an LLM into saying something offensive. The problem is that lots of people want to employ LLMs to make decisions that humans currently make. In order to do that theyll have to have access to sensitive information and the authority to make binding decisions. An exploit that can trick an LLM into discussing forbidden things might also be used to make a future LLM leak sensitive information, or make it agree to terms that it should not.
ELI5 why this is a concern. Somehow the LLM is dangerous be cause an academic can hack it and manipulate it, versus rando reading all the bank robber biographies. Neither of which is nearly as dangerous as the person sitting outside the bank all day studying all activity, and even that is a silly Hollywood strategy
LLMs with crypto- that’s the heist
The danger isnt really that someone might trick an LLM into saying something offensive. The problem is that lots of people want to employ LLMs to make decisions that humans currently make. In order to do that theyll have to have access to sensitive information and the authority to make binding decisions. An exploit that can trick an LLM into discussing forbidden things might also be used to make a future LLM leak sensitive information, or make it agree to terms that it should not.
thx
seems we not