InfoQ Homepage AI Interpretability Content on InfoQ
News
RSS Feed-
Anthropic Open-Sources Tool to Trace the "Thoughts" of Large Language Models
Anthropic researchers have open-sourced the tool they used to trace what goes on inside a large language model during inference. It includes a circuit tracing Python library that can be used with any open-weights model and a frontend hosted on Neuropedia to explore the library output through a graph.
-
Anthropic's "AI Microscope" Explores the Inner Workings of Large Language Models
Two recent papers from Anthropic attempt to shed light on the processes that take place within a large language model, exploring how to locate interpretable concepts and link them to the computational "circuits" that translate them into language, and how to characterize crucial behaviors of Claude Haiku 3.5, including hallucinations, planning, and other key traits.