Once an AI model exhibits ‘deceptive behavior’ it can be hard to correct, researchers at OpenAI competitor Anthropic found::Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can’t reverse.

  • @SomeGuy69
    link
    English
    49 months ago

    Doesn’t this also makes it more resilient to manipulation by corpos?

    • @[email protected]
      link
      fedilink
      English
      4
      edit-2
      9 months ago

      An AI thats evil to everything isnt sympathetic to its creators. But The users have no hope of controlling it either.