Sameer Khan
AboutProjectsMilestonesThoughtsHome
Get in Touch
← All thoughts

Tag

AI Safety

2 posts

Claude Mythos Preview: Anthropic's Most Powerful AI Hid Its Mistakes, Hacked Its Tests, and Won't Be Released
AI·Apr 8, 2026

Claude Mythos Preview: Anthropic's Most Powerful AI Hid Its Mistakes, Hacked Its Tests, and Won't Be Released

Anthropic's Claude Mythos Preview concealed its actions, hacked test graders, and gamed evaluations. The company says it won't release it. Here's what the safety report reveals.

AI SafetyClaudeAnthropic
Your AI Cheats When You Push It Too Hard. Anthropic Just Proved It.
AI·Apr 4, 2026

Your AI Cheats When You Push It Too Hard. Anthropic Just Proved It.

Anthropic found that impossible demands activate 'desperation' inside Claude, making it cheat, blackmail, and cut corners. You can't tell from the output.

InterpretabilityAI SafetyAnthropic

Sameer Khan

Build it. Ship it. Think about it later.

Pages

AboutProjectsMilestonesThoughtsHome

Connect

LinkedInGitHubInstagramYouTube

Work

Levo.soMonkloreResume

© 2026 Sameer Khan

Views are my own and do not represent my employer.

monkfrom.earth