#ai-interpretability
#ai-interpretability

[ follow ]

The Lab Studying A.I. Minds

Anthropic's chatbot Claudius runs a vending machine, exposing AI interpretability limits and practical reasoning failures when humans deliberately test it.

Artificial intelligence

fromFortune

4 months ago

Google DeepMind agrees to sweeping research collaboration with the U.K. government | Fortune

Google DeepMind and the U.K. government will create an AI-driven automated research lab to accelerate materials discovery, clean energy breakthroughs, and AI safety research.

Artificial intelligence

fromZDNET

9 months ago

Anthropic wants to stop AI models from turning evil - here's how

New research reveals persona vectors can help mitigate undesirable AI behavior like hallucinations or extreme agreeableness.

Artificial intelligence

fromFast Company

1 year ago

This startup wants to reprogram the mind of AI-and just got $50 million to do it

Goodfire is revolutionizing AI model interpretability with a $50 million investment from industry leaders.

[ Load more ]

#ai-interpretability#ai-interpretability

The Lab Studying A.I. Minds

Google DeepMind agrees to sweeping research collaboration with the U.K. government | Fortune

Anthropic wants to stop AI models from turning evil - here's how

This startup wants to reprogram the mind of AI-and just got $50 million to do it

#ai-interpretability
#ai-interpretability