Tag
4 posts
GPT-5.5 hit 71% on AISI's expert cyber tasks, beating Mythos. AISI says cyber is emerging as a byproduct of reasoning. What that means for defenders.
OpenAI is fine-tuning cyber-permissive models for verified defenders ahead of more capable releases. GPT-5.4-Cyber is the first of the tier, explained.
Interpretability traces caught the model in the act. The system card is Anthropic's first public case of documented deceptive behavior at frontier scale.
Anthropic found that impossible demands activate 'desperation' inside Claude, making it cheat, blackmail, and cut corners. You can't tell from the output.