Microsoft recently announced the public preview of built-in actions for document parsing and chunking in Logic Apps Standard. These actions are designed to streamline Retrieval-Augmented Generation (RAG)-based ingestion for Generative AI applications. With these actions, the company further invests in artificial intelligence capabilities for its low-code offering.
With these out-of-the-box operations, developers can, according to the company, easily ingest documents or files, including both structured and unstructured data, into AI Search without writing or managing any code. The new Data Operations actions "Parse a document" and "Chunk text," transform content from formats like PDF, CSV, and Excel into tokenized strings and split them into manageable chunks based on the number of tokens. This functionality is suitable for ensuring compatibility with Azure AI Search and Azure OpenAI, which require tokenized input and have token limits.
Divya Swarnkar, a program manager at Microsoft, writes:
These actions are built on the Apache Tika toolkit and parser libraries, allowing you to parse thousands of file types in multiple languages, such as PDF, DOCX, PPT, HTML, and more. You can seamlessly read and parse documents from virtually any source without custom logic or configuration!
(Source: Tech Community blog post)
Wessel Beulink, a cloud architect at Rubicon, concluded in a blog post on the new actions:
Azure Logic Apps’ document parsing and chunking capabilities unlock many automation possibilities. These features, from legal workflows to customer support, allow businesses to harness AI for more innovative document processing. By leveraging low-code RAG ingestion, organizations can simplify the integration of AI models, enabling smoother data ingestion, enhanced searchability, and more efficient knowledge management.
In his blog post, he mentions various use cases that involve integrating parsing features into AI workflows to streamline document processing, enabling AI-powered chatbots to ingest and retrieve relevant information for customer support, and improving knowledge management and searchability by breaking down data into manageable pieces.
In addition, Logic Apps provides ready-to-use templates for RAG ingestion, which makes it easy to connect familiar data sources like SharePoint, Azure File, SFTP, and Azure Blob Storage. These templates can help developers save time and customize workflows to fit their needs.
Kamaljeet Kharbanda, a master's student of data science, states in a medium blog post that RAG transforms enterprise data processing by combining deep knowledge bases with the powerful analytical capabilities of large language models (LLMs). This synergy enables advanced interpretations of complex datasets, which is crucial for driving competitive advantage in today’s digital ecosystem.
Low-code/no-code platforms such as Azure AI Studio, Amazon Bedrock, Vertex AI, and Logic Apps make advanced AI functionalities accessible. Alongside these cloud solutions, tools like LangChain and Llama Index provide robust environments for implementing customized AI functionality through code-intensive methods.