BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Intel DeepMath Introduces a Smart Architecture to Make LLMs Better at Math

Intel DeepMath Introduces a Smart Architecture to Make LLMs Better at Math

Listen to this article -  0:00

Intel has announced DeepMath, a lightweight agent built on Qwen3-Thinking that specializes in solving mathematical problems. To address common limitations of LLMs in math reasoning, DeepMath generates small Python scripts that support and enhance its problem-solving process.

According to Intel, mathematical problems remain challenging for large language models, which often produce verbose explanations and incorrect arithmetic. To address this limitation, Intel researchers experimented with a new agent architecture that relies on small Python executors as intermediate steps in the LLM reasoning process:

DeepMath is built on Qwen3-4B Thinking and fine-tuned with GRPO (Group Relative Policy Optimization). Instead of verbose text, the model emits tiny Python snippets for intermediate steps, runs them in a secure sandbox, and folds the results back into its reasoning, reducing errors and output length.

For example, when asked to find all positive integer pairs x and y such that y*y*x / (x + y) is a prime number, the model generates this Python executor and runs it iteratively for small values of y to produce the correct output:

from sympy import isprime

solutions = []
for y in range(1, 10): # Try small y values
  for d in range (1, y**2) : # d < y^2
    if y**3 % d == 0:
      p = y**2 - d
      if isprime(p):
        x = (y**3 // d) - у
        if x > 0:
          solutions. append
print(solutions)

Based on evaluation across four distinct datasets, MATH500, AIME, HMMT, and HLE, Intel claims that the math agent reduces output length by up to 66% while often improving accuracy, with further performance gains achieved through the use of GRPO.

GRPO training introduces rewards for correct answers and for generating code snippets, encourages shorter answers, and varies the temperature during training to promote exploration in the initial training phases and reducing it as the model becomes more proficient. GRPO uses the Tool-Integrated Reasoning (TIR) subset of the OpenMathReasoning dataset and relies on answers only for four examples, including calls and executor output so the model can learn in-context.

However, Intel notes that the most significant gains come from using Python executors to offload deterministic computation, which LLMs are not good at, thereby reducing both arithmetic and numerical errors as well as traces length thanks to code brevity.

The Python environment used to run executors is sandboxed, allowing only modules included in an allow-list. Each snippet is subject to an execution timeout, and no file or network access is permitted. However, for production deployments it is crucial to carefully manage attack surfaces, enforce rate limits, maintain isolation using containers or VMs, monitor resource usage, and validate generated code before execution.

DeepMath is available on GitHub and Hugging Face.

About the Author

Rate this Article

Adoption
Style

BT