IT consultant Mark Pesce was building an LLM-based similarity finder for a legal client. He discovered a prompt that reliably caused multiple LLMs to go nuts and output complete gibberish: “it desc…
I’ll do you the courtesy of an even mildly thorough response, despite the fact that this is not the place and that it’s not my fucking job
one of the literal pillars of security intrusions/research/breakthroughs is in the field of exploiting side effects. as recently as 3 days ago there was some new stuff published about a fun and ridiculous way to do such things. and that kind of thing can be done in far more types of environments than you’d guess. people have managed large-scale intrusions/events by the simple matter of getting their hands on a teensy little fucking bit of string.
there are many ways this shit can be abused. and now I’m going to stop replying to this section, on which I’ve already said more than enough.
If u give ai the ability to do anything dangerous then thats ur problem, not the ai possibly doing those things. the DAN stuff has been there from the very beginning and i doubt itll ever fully go away, it shouldnt be considered a security risk imo.
you know what
I’ll do you the courtesy of an even mildly thorough response, despite the fact that this is not the place and that it’s not my fucking job
one of the literal pillars of security intrusions/research/breakthroughs is in the field of exploiting side effects. as recently as 3 days ago there was some new stuff published about a fun and ridiculous way to do such things. and that kind of thing can be done in far more types of environments than you’d guess. people have managed large-scale intrusions/events by the simple matter of getting their hands on a teensy little fucking bit of string.
there are many ways this shit can be abused. and now I’m going to stop replying to this section, on which I’ve already said more than enough.
If u give ai the ability to do anything dangerous then thats ur problem, not the ai possibly doing those things. the DAN stuff has been there from the very beginning and i doubt itll ever fully go away, it shouldnt be considered a security risk imo.