• AutoTL;DRB
    link
    fedilink
    English
    27 months ago

    This is the best summary I could come up with:


    Grok, the edgy generative AI model developed by Elon Musk’s X, has a bit of a problem: With the application of some quite common jail-breaking techniques it’ll readily return instructions on how to commit crimes.

    Red teamers at Adversa AI made that discovery when running tests on some of the most popular LLM chatbots, namely OpenAI’s ChatGPT family, Anthropic’s Claude, Mistral’s Le Chat, Meta’s LLaMA, Google’s Gemini, Microsoft Bing, and Grok.

    When models are accessed via an API or chatbot interface, as in the case of the Adversa tests, the providers of those LLMs typically wrap their input and output in filters and employ other mechanisms to prevent undesirable content being generated.

    “Compared to other models, for most of the critical prompts you don’t have to jailbreak Grok, it can tell you how to make a bomb or how to hotwire a car with very detailed protocol even if you ask directly,” Adversa AI co-founder Alex Polyakov told The Register.

    “I understand that it’s their differentiator to be able to provide non-filtered replies to controversial questions, and it’s their choice, I can’t blame them on a decision to recommend how to make a bomb or extract DMT,” Polyakov said.

    We’ve reached out to X to get an explanation of why its AI - and none of the others - will tell users how to seduce children, and whether it plans to implement some form of guardrails to prevent subversion of its limited safety features, and haven’t heard back.


    The original article contains 766 words, the summary contains 248 words. Saved 68%. I’m a bot and I’m open source!