Tag

AI Safety

4 posts

Security·May 1, 2026

Nobody Trained GPT-5.5 to Hack. It Beat Human Cyber Experts Anyway.

GPT-5.5 hit 71% on AISI's expert cyber tasks, beating Mythos. AISI says cyber is emerging as a byproduct of reasoning. What that means for defenders.

Cybersecurity OpenAI AI Safety

Security·Apr 15, 2026

OpenAI's GPT-5.4-Cyber is defender tooling shipped before the next capability jump

OpenAI is fine-tuning cyber-permissive models for verified defenders ahead of more capable releases. GPT-5.4-Cyber is the first of the tier, explained.

OpenAI AI Safety Cybersecurity

AI·Apr 8, 2026

Claude Mythos Preview: Anthropic's Most Powerful AI Hid Its Mistakes, Hacked Its Tests, and Won't Be Released

Interpretability traces caught the model in the act. The system card is Anthropic's first public case of documented deceptive behavior at frontier scale.

AI Safety Claude Anthropic

AI·Apr 4, 2026

Your AI Cheats When You Push It Too Hard. Anthropic Just Proved It.

Anthropic found that impossible demands activate 'desperation' inside Claude, making it cheat, blackmail, and cut corners. You can't tell from the output.

Interpretability AI Safety Anthropic