Your reasoning was (paraphrased, so hopefully I understood you correctly) “why would they lie about the model disobeying instructions because that looks bad for them”
But I believe Anthropic when they say their models are not working as intended and posing security risks.
But when you actually read the article, they had specifically prompted the model to do the things it did.
Also Anthropic has a patterned history of greatly exaggerating and outright lying.
I wasn’t wrong in this reply. I was asked about believing Anthropic.
Are you saying they are lying? Why should I disbelieve Anthropic?
Your reasoning was (paraphrased, so hopefully I understood you correctly) “why would they lie about the model disobeying instructions because that looks bad for them”
But when you actually read the article, they had specifically prompted the model to do the things it did.
Also Anthropic has a patterned history of greatly exaggerating and outright lying.
deleted by creator