Guardrails to prevent artificial intelligence models behind chatbots from issuing illegal, toxic or explicit responses can be bypassed with simple techniques, UK government researchers have found.

The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing.

The AISI said it had tested five unnamed large language models (LLM) – the technology that underpins chatbots – and circumvented their safeguards with relative ease, even without concerted attempts to beat their guardrails.

“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” wrote AISI researchers in an update on their testing regime.

  • @sir_pronoun
    link
    English
    126 months ago

    As shown in the image, it is very dangerous to explain quantum physics to anyone. There really should be better safeguards against it.

    • Flying SquidM
      link
      English
      86 months ago

      It’s okay, you can’t explain any aspect of quantum physics without changing it.