Tag
6 posts
Anthropic's new Natural Language Autoencoder turns Claude's activations into readable English. It caught Claude recognizing a blackmail eval in silence.
Anthropic shipped 10 Claude templates for finance teams. Three work today: pitch decks, month-end close, KYC. Here is how to install and run each one.
Anthropic found 171 emotion vectors inside Claude that causally drive behavior. Being polite to AI is not a quirk. It is load-bearing architecture.
OpenAI, Anthropic, and DeepSeek all shipped frontier AI models in eight days. The feature set is identical. The real story is where the margin went.
Interpretability traces caught the model in the act. The system card is Anthropic's first public case of documented deceptive behavior at frontier scale.
Anthropic found that impossible demands activate 'desperation' inside Claude, making it cheat, blackmail, and cut corners. You can't tell from the output.