TreeQuest by Sakana AI: How Multi-Model AI Teams Outperform Solo LLMs by 30%
Unlocking the Power of AI Collaboration
Sakana AI, a leading Japanese AI lab, has unveiled TreeQuest, a pioneering technique that enables multiple large language models (LLMs) to work together on a single task.
- This approach, known as Multi-LLM AB-MCTS (Adaptive Branching Monte Carlo Tree Search), transforms traditional AI workflows by building collaborative “dream teams” of models.
- By leveraging the unique strengths and compensating for the weaknesses of individual LLMs, TreeQuest delivers solutions to complex problems previously out of reach for any single model.
Why Collective Intelligence Beats Individual Models
Every state-of-the-art LLM has distinct biases, strengths, and weaknesses—one may be exceptional at code, another at language, and another at logic.
- Sakana AI argues these differences are assets rather than flaws, offering a basis for collective intelligence similar to how diverse human teams outperform individuals.
- By pooling intelligence, models can tackle problems that surpass the abilities of any single model, achieving a 30% improvement on challenging benchmarks like ARC-AGI-2.
Inference-Time Scaling: More Than Just Bigger Models
Most AI innovation has focused on training-time scaling—making models larger and training them on more data.
- Inference-time scaling improves performance by allocating more computational resources or smarter strategies after training.
- Sakana AI’s approach refines this idea: Instead of just running a model repeatedly (“Best-of-N” sampling), TreeQuest uses adaptive branching to smartly decide when to deepen (refine an idea) or widen (try something new) the search for solutions.
How Adaptive Branching Monte Carlo Tree Search (AB-MCTS) Works
At the core is AB-MCTS, which blends two strategies:
- Searching deeper: Taking a promising answer and refining it iteratively.
- Searching wider: Generating entirely new ideas from scratch.
Using Monte Carlo Tree Search, TreeQuest dynamically decides whether to refine or innovate, and—critically—which model should handle each stage.
- Early in a task, the system tries multiple models, learning which excels in specific scenarios, then adapts by giving them more responsibility as the task progresses.
Real-World Impact: Outperforming Single Models
Sakana AI tested Multi-LLM AB-MCTS on the challenging ARC-AGI-2 benchmark, using a team of advanced models: o4-mini, Gemini 2.5 Pro, and DeepSeek-R1.
- The collaborative approach solved over 30% of 120 test problems, a result that significantly exceeded the performance of any single model.
- In practice, the system learned to allocate tasks to the LLM best suited for the job, and even allowed one model to correct mistakes made by another—mirroring human teamwork.
- This teamwork approach also reduces hallucinations, as models with strong logical grounding can correct more creative, error-prone responses.
Open-Source for Enterprise AI
To drive adoption, Sakana AI has released TreeQuest as an open-source framework under the Apache 2.0 license, making it commercially viable for businesses and developers.
- The framework provides an easy-to-use API, enabling organizations to integrate Multi-LLM AB-MCTS into their workflows for custom tasks, from complex coding to optimizing software performance metrics.
- Early applications include improving algorithmic coding, machine learning accuracy, and even web service optimization.
What’s Next for Multi-Model AI Systems?
TreeQuest and AB-MCTS represent a shift from model-centric to team-centric AI development.
- By combining LLMs, enterprises can build more robust, flexible, and accurate AI systems, while reducing risks such as model hallucination.
- As AI regulation and complexity grow, this approach offers a scalable, reliable, and transparent path for deploying advanced AI in business contexts.