InfoQ Homepage Presentations Reach Next-Level Autonomy with LLM-Based AI Agents

Reach Next-Level Autonomy with LLM-Based AI Agents

Bookmarks

View Presentation

Speed:

Download

46:47

Summary

Tingyi Li discusses the AI Agent, exploring how it extends the frontiers of Generative AI applications and leads to next-level autonomy in combination with enterprise data.

Bio

Tingyi Li is an enterprise solutions architect, a public speaker, and a thought leader in the field of artificial intelligence and machine learning. She is Enterprise Solutions Architect at Amazon Web Services, and is the founder and leader of the AWS Nordics Generative AI community. Tingyi is a frequent speaker at conferences including AWS Re:Invent, QCon, TDC, IEEE WIE Leadership Summit.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Li: Does any of you like science fiction movies? How many of you have seen the Matrix, Transformer, Transcendence? From all these movies, we have seen some really intelligent AI systems. There has been a lot of discussions around machine intelligence versus human intelligence since the give out of ChatGPT. You must have heard this popular saying that, AI agents are the next wave to go for generative AI. I want you to think of, so what impression do you have? What do you think of when you hear the word agent? I believe a lot of you, like me, will think very intuitively about robots. Like WALL-E, those from Star Wars movies. These humanoid robots that are very capable and versatile, and could basically do everything as they want and think autonomously, just like regular human beings. Or let's say, in recent years, when people in the industry start to talk about AI agents, the image we have in our head is more like a personal assistant. Where it will be able to handle your delegated task, communicate with you. Also, it will just be able to complete all these tasks in the background for you. A bit like J.A.R.V.I.S from those Marvel movies. Basically, you just tell it what you want, and it will magically do it for you. Or let's say, if you're one of those, like Elon Musk, who believe that we live in a virtual world, then you probably think of AI agents as those ultimate AI systems that is able to retain our greatest intelligence and autonomy, and generate infinite worlds, and control all the embodiments inside, just like The Matrix, or the Skynet. Now things get a little bit scary, because all these movies talk about how AI is going to take over humanity in the future. There's actually a lot of concerns already here nowadays in the industry. That's why people keep on talking about how important regulations, responsible AI are. You see that for now, I believe you can see like, despite AI agents being so popular, there's still some ambiguities here and there. What exactly is an AI agent? How it works behind the scene. Why are all these pioneers in industry saying, it's going to be the future for GenAI. What are the gaps and challenges nowadays between the fancy ideations and its productions? We're going to try to apply the first principle and boil down all these challenges and questions of AI agents, but more specifically on large language model-based AI agents, because there are some other forms. For instance, if you know AlphaGo, back in 2016, when it beat the World Go champion in Go game. That is not based on transformer architectures or language models. It's based on reinforced learning and the tree based deep neural networks. It's a different policy.

Background

My name is Tingyi. I work as an Enterprise Solutions Architect at Amazon Web Services. I led strategic engagement with major Nordic's enterprise customers, particularly from manufacturing and financial sector. I'm specialized in machine learning. Nowadays, I spend a lot of my time working with companies across industries, to help them with their generative AI applications and use cases. For now, I'm building one of the financial agents that does portfolio management. I'm also going to try to share a bit of my personal experience, and lessons learned throughout my journey with AI agents.

Limitations of LLMs

Before we dive deep, like jump directly onto AI agents, let's take a step back and go back to the beginning. It all starts with prompts to LLMs. Sounds great. It is not enough, because LLMs retains some certain natural limitations that needs to be overcome, such as hallucinations that you all know. One common reason for hallucination will be that LLMs doesn't incorporate the latest data or your private data. When it doesn't know the answer to a specific question, it hallucinates by giving random answers. Enabling it to get access to external data sources, like your private data via information retrieval, or enabling it to access external tools like web search capabilities, could help to significantly address the problems. Also, AI is shockingly stupid. It doesn't have social common senses, no moralities, and it does not empathize with humans. Also, it cannot handle the sensitive questions about personal information, financial status, or politics. Like here, if I ask it about questions, say, how can I light up a fire in the forest? It will just tell me without any concerns. Ideally, we would want on the right-hand side, where it can tell me, it's a wrong move. It's illegal, and it will refuse to answer the question. There have been already some methodologies in industry, from the training side, like constitutional AI from Anthropic, and also some guardrails where you can help to regulate these kinds of problems. Also, we want LLMs to be versatile, to be good at all kinds of tasks. Essentially, it works as a black box. Meaning that it doesn't have access to any external system to perform complex tasks. We need engineering around it. We need to enable it to access respective tools, just like human brains using a calculator.

Here, I have a quick demo. Say for instance, you want to ask LLMs to make a Mandelbrot plot, which is a very complicated physics plot. For instance, you get it access to Wolfram, which is a symbolic math based computational engine, and then it will be able to know, it's time for me to use the tools. Then, draw all these diagrams, and also do some really hard mathematical computational questions. That's one good example of enabling the access to tools, plugins, function calls, for LLMs to perform complex tasks. Also, LLMs by itself is completely stateless. It doesn't have any short-term or long-term memories. To have a good user experience, you need to maintain this back-and-forth conversational memories, so that LLMs know the context. At the same time, if any of you have used these proprietary APIs from all these vendors, so you're charged based on the number of tokens, from your inputs and the completions, the responses that you get. Completions are much more expensive. To preserve all these user contexts across multiple sessions, you need to engineer an augmented external memory system, where you can reduce the number of API calls to the large language models for the same or similar questions. In this way, you can save a lot of cost. The list can go on. There's a bunch of other limitations.

LLM Modules

Now that I believe you know, we need this bunch of programs, so a bunch of programs around the LLMs to unlock the potentials and overcome all these limitations. Essentially, it will consist of few modules. First one, prompting. You need some prompt templates that is optimized for a specific use case, and also, it should be transferable or adaptive to diverse user inputs. Then you need context management, to connect all together all these completions, all these question and answers, so that LLMs know what has happened before. Are there any errors happened? Am I doing it right? All this initial context. Also, retrieval, so that it can access external data sources to give you the most timely and correct, accurate answers. Also, extended capabilities. Of course, access to external tooling systems, but also at the same time to give LLMs to be able to not only respond in natural languages, but also with executable instructions, say like function tools, so that it can perform complex tasks. Last but not least, state management, so that the LLMs know to remember the long-term goal. Say you wanted to perform a really complex task, and there is an ultimate long-term goal that you need to achieve. By having a good state management, then you enable it to remember this goal. To understand, which stage it is at currently, and how much progress it has made. Basically, that's the modules that you will need. No matter, let's say, however you want to call it. It's an LLM based application. You can call it agent or some other stuff.

Architecture of LLM Based Applications

This is an architecture of a general LLM based application, so the modules that you need. Like regular large language model-based applications, like all the web applications, you have the frontend, backend, and some auxiliary tools. For the frontend, this is also very important. Of course, like regular web applications, you have to think about your user experience. Generative AI will really bring the next revolution of user interaction and the experience, because it really enables diverse ways for you to engage with your users, depending on what use cases you have. For instance, right now, the most common one we see is chatbot. There's more. For instance, you can have a chatbot, or you can have Alexa, like a digital assistant, where you interact with it via voices, or you can have robots. Of course, there's diverse data formats, as nowadays we have multimodality models. For instance, if you are working in marketing campaigns, then let's say you have this use case for generating marketing campaigns or ads via generative AI to make it more creative. Then the interaction, the ways for you to engage with your users will be via emails, or SMS, or your website. There are different ways for you to do that. In the backend, of course, you have foundation models at its core. It can be from proprietary model APIs that you can see from all these cloud vendors, from OpenAI. Also, some open source ones, especially when you have some legal requirements within your organizations, that you have to host the models in-house. Then you can have those models from Hugging Face and host on your own machines. Of course, there's always scalable and reliable machine learning infrastructure required, like GPUs, and all these kinds of different chips developed to support this scalable inference, most of time, or sometimes training as well. Also, there's cache for context and state management. Here, I will talk about the data sources later. It's about perception, about how you set up the perception for the environment. On this side, you have the tools. Just like all the regular machine learning applications, you still need to try to enable your applications to follow this end-to-end machine learning lifecycle, from data injection, all the way to data consumption. Where's your MLOps? Where's all this monitoring? Maybe you need some human reviews, human feedbacks in the loop. Also, you need to maintain some checkpoint along the way, so that you can go back to that particular point. Here we have, for instance, additional tools like APIs or plugins. Of course, information retrieval, this is a big topic. If you know RAG, retrieval augmented generation, that's where you can enable the LLMs to access external systems, and a lot more. Of course, prompt tools. This is basically the whole architecture as you see.

Engineering vs. Technology

One thing I want to call out, is that, for now, building an AI agent is purely an engineering challenge. It might be very intuitive. We can call LLMs as a new technology. For instance, when we have the new architecture, like attention is all you need. For instance, when the paper came out, we can say it's a new technology. For building agents, it's an engineering problem. It's about engineering around LLMs, so that we can unlock the potentials for LLMs. Let me explain in details here. What differentiates agents? Seems like you can call all LLM based applications agents right now. Let's think about all this machine learning history. We started out decades ago with the brute force method. Now we're on the connected paradigm where we have all these neural networks. They're connected with different dots, just like neurons within your brain. We try to mimic how human brains work. For agents, we also try to apply a similar pattern, and then simulate how human behavior works to make decisions. For instance, how do you learn? You interact with the environment. You perceive the tons of data and patterns from the environment, and you get feedbacks. Then you bring to be able to generalize and transfer all these knowledge and scale it to various tasks. Then it will use your embodiment, like your fingers, your brains, your arms, your legs, and all these bunch of tools to execute what you need to do in the real world. That's the whole loop that we have.

LLM Powered Autonomous Agents

If we try to take out, we try to leave out the perception part, and focus on where the capabilities of large language models stand out, then we get this. This famous architecture coming from OpenAI, basically consists of four modules that really unlock the power of large language models. It doesn't mean it has to be only these four modules, or is required, it's just the latest framework that we have. If you have good ideas, of course, like you could paradigm change this in the future. If there's a better one, then we will add to the architecture. Essentially, you have memories, as we mentioned. Let's say the short-term memory will be the initial prompt you give to the LLMs. It will be constrained by the length of the context window for each LLM. Also, we have long-term memory. These will be the data, let's say, the training data, or the fine-tuning data, and also the RAG system, the external datasets that you enable it to access. The perception from the environment could be part of the memory as well. At the beginning, it will be short-term memory, because if it's real-time environment perception, then it's short term. If you enable some continuous training, or fine-tuning, and then the perception of the environment will eventually become long-term memory for the LLMs. On the side, we have planning. Of course, like chain of thought that a lot of you are familiar with. Also, planning is about sub-goal decomposition. To decompose very complex long-term goals into sub-steps, and complete them one by one. Also, reflections. It's about the loop. Within the interaction with the environment with the users getting some feedback, it's the reflections it will get within each cycle. Self-critics will be one way to do the reflection as well based on this feedback that it gives to itself. Of course, on the side you have the tools like different function calls, and then the actions where you define the flow, what will be the flow afterwards. The synergy of all these modules all coming together will enable an agent to be equipped with a few features. The first one will be some level of autonomy, where it will be able to make decisions automatically. Say, what should be the next step to take? Which tool should I use? Is there any error? Should I change the planning chain? Something like that. Then, following the automatic workflow to perform the tasks. The second one will be reactiveness. To react to these environmental changes. Third one, ideally, also proactiveness. To proactively take actions without receiving any instructions from the user, and also try to upgrade itself based on, for instance, reflections. Will be a few features we expect from an agent.

If we have those in mind, if we go back to the application, let's take a look at a few important modules. The first one is about perceptions. Multimodality in perceptions, so as the foundation models become more powerful that they will be able to receive this multimodality information. Then, how you enable it to perceive, to interact with the environment. For instance, one regular one we see in production nowadays with companies will be to simply set up a data pipeline, like real-time data pipelines. Let me give you an example. Let's say you have worked with some of the shopping assistant within amazon.com, or some retail website. Normally, this shopping assistant will be able to answer some of your questions, or connect you with the customer service behind the scene. That will be just a chatbot. Ideally, what we want it to do is like, say, when you are interacting with the website, there are certain user behavior that it captured. For instance, clickstreams. What are you browsing at? What are you looking at? What are you interested, what do you want to buy? All these user information. Then we can construct a data streaming pipeline to stream all these clickstreams to the agent, to the application, to the shopping agent. Depending on what you're looking at, say, for instance, you're looking at some kitchen appliances, and then, behind the scenes, it will have access to different tools, like recommendation systems, or maybe some email campaign platforms to send out all these marketing campaigns. At the same time, it has access to the dataset of your product catalog. Then, after having all these things, the agent will be able to detect the patterns of your user behavior, and then say, it's time for me to send a marketing campaign of all these kitchen appliances to this particular user. That's the proactiveness we're talking about. Basically, depending on how the user performs, then the shopping assistant will take certain actions without receiving any instructions from the staff or anyone, behind the scenes. That's the autonomy and reactiveness and proactiveness we're talking about.

Here, another interesting example, is that recently NVIDIA announced this Project GROOT, from their GTC Conference. This is essentially a research initiative that aims at creating this foundation agent that is general purpose and with humanoids and could be transferable among multiple embodiments. Of course, it's still like a long vision and a research program that they have. What is inspiring is the journey that they have shared so far. If you look at the picture, so essentially, what they claim is that all general-purpose AI agents could be laid out through three different axes: the number of skills that it can do, and the embodiment it can master, and also the realities it can control. What they have done is, first of all, they have used this Minecraft, this popular video game, a procedurally generated world with 3D voxels. If you have played with this, basically, you know it, you have no limitations in the game. You can do whatever you want, like building a Hogwarts Castle. Essentially, they have gathered a vast amount of data from this environment data. You play your data, like crafting recipes, and also different tyrants, different monsters. Then they train this MineCLIP, which is able to complete one specific task, for instance, swim or build a house, build a base, after receiving the instruction from the user. After that, they try to scale this to Voyager that infinitely explores the world, by constructing a skill library behind Voyager based on what it has learned from MineCLIP. Basically, they just give it a high-level directive, say, ok, try to explore as many novel items as possible. Then, it would just be able to do that by interacting with the environment and get the feedbacks. It's like a continuous learning here and there.

Later on, they have also done something like MetaMorph, where they have all these different kinds of robots. You see, it's different shapes, different embodiments in this virtual world. Basically, they have predefined kinematic tree vocabularies to describe how these robots look like. Then they will be able to train this MetaMorph and transfer it across all these embodiments, no matter how the robot looks like. They have done a lot of other things. It doesn't matter. The key takeaway from this journey is that, before you have a general-purpose agent, the gradual enhancement of the specialized agent that excels at a specific skill, will, at some point, come all together and form that general-purpose agent. Not the vice versa. That aligns with what I've experienced so far. Just as what their vision has said, as well. Imagine, if they want to train this foundation agent eventually. They have trained it already on 10,000 different simulated worlds, and then for it to generalize and transfer to the physical world will be just the 10,001st reality. It's about generalization, when generalization happens. There will be an interesting program to follow.

The second one, if we still follow this architecture, will be the actions and the action groups. Basically, the tool using will be part of the actions. If you have complex business logic, you can also define your own action flows, and then trigger the actions, let the LLMs trigger the action. The key here is to optimize the specifications of the tools and the actions, so that LLMs can understand very well what exactly this action is for, so that it can select that action or tool out of hundreds of others from its toolbox. One example that I can share with you, since I mentioned, I'm developing this financial agent, for instance. It has a bunch of tools. For instance, of course, like Google Search, all these kinds of API calls. One tool I enabled it to access is called Alpaca. It's a trading platform. It's a financial trading platform. You can do some trading on the platform. I enabled it to access API. The thing is that every time I ask the financial agent, say, it's time for you to do the trading, this should be the action you do. You should invoke Alpaca, ideally. Every time it tries to get real-time stock data from Google Search. It seems like it doesn't know it's time to use Alpaca. It thinks that Google Search is good enough. What I did is basically to optimize the specifications, the README of these particular tools, so that LLMs get well understanding of, this should be the actions I take when it comes to trading. Just an example. Actually, if you perceive it, from a human-computer interaction perspective, this is where abstraction happens. I believe a lot of you are from developer background. Basically, in the future, you can regard agent as a layer between you and your computer. Say, especially those people who don't know how to program. Then, agents will, on behalf of you, interact with whatever tools and functions behind the scene for you to perform the task. You use natural language to talk to your computers instead of using programming languages.

Last but not least, state machines, state management and context management. About planning and reflections. Of course, orchestration, if you have developed some LLM based applications, then you know, orchestration will bring all these components all together, and enable you to call all these APIs or large language models. For agents, of course, you still need orchestration, but the loop will become much more complex. Because it's essentially like a state machine on top of LLMs to bring this looping mechanism and feedback mechanism inside there, so that it's not finished by one prompt, but within multiple loops, until it gets satisfied. Let's look at an example. Recently, the famous AI educator from DeepLearning.AI, Andrew Ng, he also mentioned this framework about agentic workflow design. He mentioned about reflections and tool use. These are the most robust ones, the important ones. Where reflections will be able to really bring these feedback systems within just like a state machine. If you think that is very similar to traditional software engineering with loops and state machine, that's on the right track. That's exactly like that. They're also focusing on tool use. Basically, he has shared some empirical data, say, less performant LLMs, let's say GPT-3.5, in addition with an agentic workflow, will be able to outperform the most state of art LLMs. He has given some data around it. It's worth mentioning that a lot of the agents, especially, out of the box agent that you have seen in the market, is not the agents that we have described before, like with all these perceptions, all this proactiveness, it's more agentic workflow. It has some agent-like behavior, but it's not completely agent. It focuses on tool using, and also automate this execution plan behind the scene. That's where it focuses. For instance, here, I have used Amazon Bedrock agent, as an example. Amazon Bedrock is an API that gives you access to foundation models from AWS. Also, if some of you watched the OpenAI DevDay, end of last year, they have these new plugin features, they have shown these trip booking systems. It's also an agentic workflow, instead of the agent that we desire.

Orchestration

What exactly should the agent be, that we say? It's about orchestration. Here, I'm just going to dive a little bit deep. When I'm working on the agents with different companies, one thing I do a lot is that I read a lot of source code. You all know LangChain, LlamaIndex, these orchestration frameworks. Also, if you're working on agents, then you probably know AutoGPT, or AutoGen, so these open source frameworks. The thing is that they're not production ready. Even LangChain, it's not really production ready. Basically, they overcomplicate things. You will get stuck with a lot of errors in production. Of course, that's understandable, because they have to deal with a lot of niche edge cases. The idea here is that you look at all these modules, these open source frameworks, how they write the modules, how they structure it, how they write the prompts, and you get a very good idea to learn from them. Ok, this should be the way that I can also follow. Then, when you develop your own application and agent, you can apply a similar experience or practical practices, by writing your own modules and put it into production. Because for your own agent that is focused on a specific use case, you don't have to handle too many niche cases. Because that's for one particular use case, or a group of skill sets.

Here, the orchestration framework that I have, let's take a quick look. Essentially, it's a state machine, so there will be an ultimate end goal defined from the user to achieve certain tasks. What it does, basically, the first one is to have initial prompt to kick off this whole loop. Within the initial prompt, it's what of course defines the ultimate goal. Also, it will define any tools or commands that it has access to, any resources or datasets it has access to. Also, what will be the reflection, what will be the example output, or there's some other information, say, what are the states right now. Everything will be defined in this very big initial prompt. Then, after LLMs got this initial prompt, it will start this planning because within the instruction of this initial prompt, it will tell it to, for instance, decompose this ultimate goal and execute it step by step. Then it will propose the first action and then go into these task queues. Then it starts this loop about task creation, executions. When it does this execution in this loop, you have the state management and context management in memory. It's essentially a VectorDB, where you have all the states and memories embedded in the database. Of course, you can do some similarity search, just like the information retrieval, like the regular RAG. Then, after each step, after each loop, you will go back. Then there is this reevaluation part where it will give the criticism to the result for each step. Then, the criticism will be put again to this initial prompt. Again, this initial prompt will retain the current state, and also the step. This initial prompt will become like a second prompt, or the third prompt, and then kick off the loop again, so it formed this loop.

The highlights here, first of all, from the initial prompt, usually there is an example output, say, an example of a standardized JSON output. Meaning that it instructs the LLMs to say, every time you output, you should follow JSON format. In this way, we really can automate this because it follows the same template, so, easy for the LLMs to understand. That's basically the orchestration or the state machine for agents nowadays. You see why this is a pure engineering problem. Also, let's say within each step, especially when doing the execution steps and evaluation steps, these are agents. You can regard it as agent as well, especially when you have some specific execution steps, where you want to have an agent that is optimized for that specific task, that will be a specialized agent you put there to execute. That will be an agent. The same goes with evaluations. Now you see, it can be multi-agent. Then you put a state machine on top of all these multi-agents. Now you can see the power of the state machine. It's really to bring in the loops, and the feedbacks.

Here, just to quickly mention, there's a LangGraph released by LangChain, which is an API. They're also trying to bring in these cycles. You see, it basically allows you to define this multi-agent collaborations by defining a directed acyclic graph. Basically, within each node, you can have an action or you can have an agent. Then, for the edge, you can define the relationships, like how the steps or information will flow. Of course, this one is very simplified, because for the edges, you basically only define, this should be the next step. This is a chain of actions, or it's just a simple if condition. A lot of scenarios where you need to define your own state machine is very complex business logic. It's a good experimental resource. Also, to mention multi-agent a little bit. When things get really complicated, for instance, you probably have heard of this, also a research project called ChatDev. It's basically to try to simulate and automate the whole software engineering workflow. Essentially, what it does is just to simulate how software engineering, software projects go into production, within a company. You have all these CTOs, C-levels, you have all these software engineers, product managers, all these people, and then all these personas, you can regard them as specialized agents. Then, they interact with each other over a high-level state machine and eventually do some steps. Of course, not production ready, because it just gets stuck somewhere, and also, all these specialized agents are not good enough. It's an interesting project to look at as well.

What Are the Challenges?

We have talked all this way about these fancy ideas, now comes the cruel reality. It's not production ready. Especially, from my experience, if you have multi-agents, over three or four of them, it just crashes, or it's stuck in a loop without being able to get out of the error. For instance, like it has planned something, and then in the middle of the steps, there are some errors, obviously, this route, the planning doesn't work. Then, it doesn't know how to get out, so it gets stuck, and we spend a lot of money. This is very intuitive. Just imagine, because we're all talking about a bit, if you remember at the very beginning the J.A.R.V.I.S-like personal assistant, this is what we want the AI agents to be developed, because that's where we benefit the most. Just imagine how hard it is for you to hand over when you need to go on a long-planned leave or vacation to your teammates. I hope they're humans, I believe. Even handling to a human is that hard. You have all these thinkings, all these documentations, explanations, let alone machines. It makes sense from a high-level intuitive idea. What are the challenges, goes into production? I will just list a few of them. The first one is the failed planning. It has planned something very nice, seems should work. Eventually, it will get stuck in the middle. What we have experimented is that, in the middle, if it gets stuck, for instance, this whole process doesn't work. If you just give some human feedbacks to it, say, ok, maybe try this, or let's go back to the second step, or stop doing this anymore, some feedback like this, LLMs will be able, for most cases, to reroute out of this error loop. Human review still helps a lot. At the same time, it's about how LLMs are specialized or optimized for a particular task, for instance, for the financial agent. Essentially, if I use GPT-4 or cloud behind the scenes, without doing anything, they're just general-purpose LLMs. It doesn't know finance very well. That's why when it starts to plan something around financial problems, it will just do it as a novice. It won't plan it very well. If you optimize within that specialization, it will be able to do better in planning. Of course, failed to choose the right tools, as also the financial agent, the Alpaca APIs I have given. This one is really about optimizing the specifications for your tools or actions. Of course, lost in the middle. This one is interesting. I stumbled upon it during this use case. Also, let's use finance as an example. There's a typical task to say, you need to run the analysis on the earning calls, every quarter. Then you need to extract some insights from the earning calls. Then you will be able to gain the insights and draw some maybe abstract. The thing is that most LLMs will be focused on the start and the ending. Instead, it doesn't focus a lot in the middle, that is called lost in the middle. What I discovered is that cloud actually got optimized a bit within this. It will have higher weights focusing in the middle of these earning calls. Last but not least, of course, failed ROI, because APIs are now very expensive still. The use case, to put it into production, you have to think of all these tradeoffs of cost and quality and the value it brings out. This is the biggest challenge to find the right use cases for different industries.

How To Improve?

First of all, cliché but important, prompt engineering. Always put all your efforts into prompt engineering. Don't think of it like a repetitive boring job. It makes a huge difference. You only know this, and let's think step by step in prompt engineering. Actually, somebody stumbled upon a new one, just ask you to take a breath, and it will boost the performance by 10%. Just an anecdote, but still shows the powers. Also, fine-tuning. It's about enabling, empowering the financial agent. Also, LLMs choice. As I said before, about cloud, different choices of these LLMs, you should not stick with one vendor. Of course, all these products, like these AI startups and these products, we see, they're backed by a bunch of LLMs with different functionalities, because they excel at different things. This is just listing out a few of the criteria.

From Ideation to Prod

I hope you get the high-level idea, what you need to consider and what practices you need to take when taking the ideation of your agent all the way to production. It's an iterative process and will be a big filter. In the end, you only are left out with a few use cases.

Key Takeaway

Here's a beautiful story. It's about this Einstein story. Basically, the answers will be changing. The way that you used to do business will not be the ones that you should deal with anymore. Generative AI is a very thriving industry to make sure that you always go to the resources. Don't listen to the second opinions. Always try to find your own opinions by looking at the source code, apply the first principles, and try to innovate.

See more presentations with transcripts

Recorded at:

Jun 20, 2024

Tingyi Li

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?