Researchers showed that large language models use a small, specialized subset of parameters to perform Theory-of-Mind reasoning, despite activating their full network for every task.