Sarang Kulkarni on Lessons from Building Deep Research Agents in Production

Deep Research Agentic Systems, such as OpenAI and Gemini Deep Research Agent, are AI Agents designed to conduct multi-step research on the internet for complex tasks using dynamic reasoning, multi-hop information retrieval, and generate comprehensive, structured analytical reports at the level of a research analyst.

Sarang Kulkarni from the Thoughtworks team spoke at the Arc of AI Conference 2026 on how to design and deploy multi-agent research systems for deep reasoning and synthesis, and the lessons learned from real-world healthcare and pharmaceutical R&D projects developing Deep Research Agents. He also discussed how the team leveraged techniques like agentic loops and harness engineering to get the best out of the solution.

In critical industries like healthcare and clinical trials, the researchers need more than the traditional AI models that perform simple Q&A tasks. They need systems that can discover, connect, and reason across both internal and Internet data, while maintaining reliability, transparency, and compliance.

Kulkarni started the presentation by highlighting that it typically costs $2.6B to bring a new drug to market. Also, about half the research studies are conducted without prior evidence because the knowledge exists, but access to this knowledge and information is broken. In the overall drug discovery and development pipeline, getting the right data at the right time is a major challenge. With the goal of inventing a new drug using AI technologies, their team built a Retrieval Augmented Generation (RAG) based chatbot two years ago to search through the unstructured data. For simple queries in the study, the RAG solution worked fine, but for complex questions, they had to enhance it to be an agentic RAG [] application. And for deep research use cases, the team developed a solution they call the Agentic RAG++.

Kulkarni shared the details of the deep research system, which consists of a clarification loop, research loop (to perform the tasks think and plan, execute, reflect, adjust the plan), and the writing loop that focuses on the write and reflect tasks. The researcher agent initial version was based on two tools: RAG tool and text2sql tool. RAG tool’s design is based on weighted hybrid search, 20 context chunks, a re-ranker, and seven refined context chunks. The text2sql tool is responsible for feeding SQL query errors back to the LLM to improve the model for better accuracy of query execution. He mentioned factors like higher token cost, poor performance, and high latency can result in poor retrieval from AI agents. Context anxiety is another problem that teams need to be cautious about. Also, incomplete data can lead to poor self-evaluation, but techniques like the reflection loop can help with data completeness.

The speaker discussed the different failure modes they had to address when developing the custom deep research agent solution. Long-horizon tasks require an explicit think-act loop. This can be solved by incorporating multiple steps: think (which works before research), plan (which works before research), inspect (which works after the research is complete and validates the output), and finally, the update step, which actually creates the final report. Anthropic's "think" tool and other similar solutions can help formulate the reasoning process.

Also, the long-horizon tasks tend to break decisions between steps in the overall process. The reflection step in their solution includes not only data reflection but also a process reflection that assesses whether the process is complete. This phase includes a third reflection step called Draft Writing Loop that helps with synthesis gaps, for example, any information that was in the research but the write task didn't capture it, so the re-draft step takes care of it.

Kulkarni concluded the talk with a discussion on the emerging harness engineering techniques, where designing the tools, memory systems, and validation checks, constraints, and feedback loops makes autonomous AI agents more reliable and accountable. The goal of harness engineering is to help AI solutions shift from just prompt engineering to the automated execution of tasks by AI agents. Since AI Agents are basically the combination of a model and a harness, the better the models are, the thinner the harness needs to be.

About the Author

Srini Penchikala

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Srini Penchikala

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter