ChatGPT • 1 year ago

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com

cross-posted to:
technology
[email protected]

45

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com

ChatGPT • 1 year ago

cross-posted to:
technology
[email protected]

Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can't reverse.

Chat

@gibmiser
link
10•1 year ago
Learned behaviors are hard to unlearn…
- @MsPenguinette
  link
  7•1 year ago
  Once it’s learnt this, it’ll just get better at lying when you try to punish/correct lies
  - mozingo
    link
    English
    4•1 year ago
    Which is exactly what the article says happens