Sameer Khan
AboutProjectsMilestonesThoughtsHome
Get in Touch
← All thoughts

Tag

Interpretability

1 post

Your AI Cheats When You Push It Too Hard. Anthropic Just Proved It.
AI·Apr 4, 2026

Your AI Cheats When You Push It Too Hard. Anthropic Just Proved It.

Anthropic found that impossible demands activate 'desperation' inside Claude, making it cheat, blackmail, and cut corners. You can't tell from the output.

InterpretabilityAI SafetyAnthropic

Sameer Khan

Build it. Ship it. Think about it later.

Pages

AboutProjectsMilestonesThoughtsHome

Connect

LinkedInGitHubInstagramYouTube

Work

Levo.soMonkloreResume

© 2026 Sameer Khan

Views are my own and do not represent my employer.

monkfrom.earth