BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Sarang Kulkarni on Lessons from Building Deep Research Agents in Production

Sarang Kulkarni on Lessons from Building Deep Research Agents in Production

Listen to this article -  0:00

Deep Research Agentic Systems, such as OpenAI and Gemini Deep Research Agent, are AI Agents designed to conduct multi-step research on the internet for complex tasks using dynamic reasoning, multi-hop information retrieval, and generate comprehensive, structured analytical reports at the level of a research analyst.

Sarang Kulkarni from Thoughtworks team spoke at the Arc of AI Conference 2026 on how to design and deploy multi-agent research systems for deep reasoning and synthesis, and the lessons learned from real-world healthcare and pharmaceutical R&D projects developing Deep Research Agents. He also discussed how the team leveraged techniques like agentic loops and harness engineering to get the best out of the solution.

In critical industries like healthcare and clinical trials, the researchers need more than the traditional AI models that perform simple Q&A tasks. They need systems that can discover, connect, and reason across both internal and Internet data, while maintaining reliability, transparency, and compliance.

Kulkarni started the presentation by highlighting that it typically costs $2.6B to bring a new drug to market. Also, about half the research studies are conducted without prior evidence because the knowledge exists, but access to this knowledge and information is broken. In the overall drug discovery and development pipeline, getting the right data at the right time is a major challenge. With the goal of inventing a new drug using AI technologies, their team built a Retrieval Augmented Generation (RAG) based chatbot two years ago to search through the unstructured data. For simple queries in the study, the RAG solution worked fine, but for complex questions, they had to enhance it to be an agentic RAG [] application. And for deep research use cases, the team developed a solution they call the Agentic RAG++.

Kulkarni shared the details of the deep research system, which consists of a clarification loop, research loop (to perform the tasks think and plan, execute, reflect, adjust the plan), and the writing loop that focuses on the write and reflect tasks. The researcher agent initial version was based on two tools: RAG tool and text2sql tool. RAG tool’s design is based on weighted hybrid search, 20 context chunks, a re-ranker, and seven refined context chunks. The text2sql tool is responsible for feeding SQL query errors back to the LLM to improve the model for better accuracy of query execution. He mentioned factors like higher token cost, poor performance, and high latency can result in poor retrieval from AI agents. Context anxiety is another problem that teams need to be cautious about. Also incomplete data can lead to poor self-evaluation, but techniques like the reflection loop can help with data completeness.

The speaker discussed the different failure modes they had to address when developing the custom deep research agent solution. Long-horizon tasks require an explicit think-act loop. This can be solved by incorporating multiple steps like think, plan (that works before research), inspect (works after the research is complete and validates the output), and finally the update step, which actually creates the final report. Anthropic's "think" tool and other similar solutions can help formazlie the reasoning pause.

Also the long-horizon tasks tend to break decisions between steps in the overall process. The reflection step in their solution includes not only the data relfection, but also a process reflection that assesses if the process is complete or not. This phase includes a third reflection step called Draft Writing Loop that helps with synthesis gaps, for example any information that was in the research but write task didn't capture it, so the re-draft step takes care of it.

Kulkarni concluded the talk with a discussion on the emerging harness engineering techniques, where designing the tools, memory systems, and validation checks, constraints, and feedback loops make autonomous AI agents more reliable and accountable. Harness engineering’s goal is to help the AI solutions shift from just prompt engineering to focus on the automated execution of tasks by AI agents. Since AI Agents are basically the combination of model and harness, the better the models are, the thinner harness needs to be.

About the Author

Rate this Article

Adoption
Style

BT