Transcript
Tejas Kumar: My name is Tejas Kumar. I'm a developer relations engineer at IBM. I work on AI. I work on a lot of AI solutions, mostly AI research and development, and how we can make AI more accessible to developers. As you know, IBM has a ton of AI offerings like watsonx. We actually have our own models like GPT-5, but the ones that we make. How do we make this accessible to you to build things and build revolutionary technology? We're not going to talk about IBM at all, actually, because this talk is not on behalf of IBM. It's not IBM speaking, it's me speaking. We're going to talk about AI innovation in 2025. To understand AI innovation in 2025, we need to look at the broad spectrum of time, like how did we get to be where we are today, and where are we going tomorrow? There's just a lot of hype around AI today.
If you want to increase the valuation of your company, just put some AI feature on it and watch the sharks come to you. There's a lot of influencers saying things like, you're going to miss out if you don't do this thing. There's a lot of stuff on LinkedIn and X that can be annoying. Anyone annoyed with AI here today? This talk is not that. I just want to be very clear. We're going to, through the lens of science and reason, look at actually what AI is, and how we can apply it. We'll also get into things like Model Context Protocol, MCP, all of that. There is no hype here. This is not marketing. This is definitely not fearmongering. This is just logic.
AI, Yesterday
We're going to talk about AI. We're going to look at AI yesterday. We're going to look at AI today. We're going to look at AI tomorrow. Let's start right at the beginning and talk about AI from yesterday. I don't mean yesterday, I mean, where did it start? I think this is important, because there's a lot of companies making a lot of money off the term AI. There's a lot of people saying a lot of things about AI as if it's new. I'm here to tell you, AI is not new. If we think from first principles, artificial intelligence is nothing new. We've simulated intelligence from a long time ago. How early did it start? It really started even with statistics, in 1906. This guy named Andrey Markov was a mathematician, statistician in university and was really interested in poetry, Russian poetry. He was like, interesting, there's some type of rhythm to the words in these poems, and through statistical modeling, came up with a way to predict with reasonable accuracy, the next state of a system based on the current state of a system. What does that mean?
In the case of poems, the system is based on words. You have a word, and then you can predict the next word based on the current word. He found success with this. This would eventually become what we call Markov chains, and eventually grow into statistical Markov models named after him in 1906. The weather is actually predicted oftentimes with a giant Markov chain. How does that work? You know what the weather is today. I mean, look outside. You know all the possible states of the system: it can be sunny, it can be cloudy, it can be rainy. If you're in Germany, it can only be rainy. It can be many states. Through Markov modeling, what you can do is say, today is this, therefore, tomorrow, the chance of that, aka the next state is this probable. We use Markov chains every day, especially if you have an iPhone. \
If you have an iPhone, and you pull up the keyboard, above the keyboard, you will see these word suggestions, three of them, left, a middle, and a right. I don't know if you've played this game, but I encourage you to try it, like pull up the keyboard, type one word, just like I, the letter I, me, and then you get these three suggestions. If you tap on the middle suggestion repeatedly, over and again, you get a sentence, maybe a paragraph that doesn't make any sense. That's because that's a giant Markov chain, they predict the next word based on the current one. Language doesn't work this way. This is the beginnings of AI, we predict the next state based on the current one.
Fast forward a long time, and AI is pretty old. Before I was born, we saw a lot of AI in the world. For example, look at this, this is an intelligent system, like he eats the big ball. Then the ghosts are like, whoa, and they start running away, because then he can eat the ghosts. I would wager today that's like artificial intelligence. It's literally like if something happens, then do that other thing. The difference between this and machine learning and AI we'll talk about, but do not be mistaken, this isn't a system that simulates intelligence, artificially intelligent.
The limiting factor here is that the intelligence is just a bunch of rules that we know ahead of time. People wrote this code, and they're like, so if this then that, and that's how they simulate it, that's how you created like artificial intelligence. Rule based, the technical term for this, as many of you know, is rule based AI, it's rule based, where the programmers of in this case, Pac-Man, knew the rules ahead of time. I grew up playing Mortal Kombat, and Prince of Persia, and a bunch of games like Super Mario where you have enemies coming at you, and you do stuff. That's all examples of rule-based AI. It's old. That was around 1980, that Pac-Man came out.
A few years before this, we also started to see what would become generative AI. The way that we reason about AI today, there was a paper published that highlighted something called backpropagation, which is an interesting algorithmic construct, which set the stage for deep learning and the next evolution in AI. For those of you who don't know very much about machine learning, I'm here to tell you that you don't actually have to know machine learning to be an AI engineer. We'll talk about that later. How machine learning models work is you have an input, and they predict an output with some level of certainty or confidence. Between the inputs and the outputs is a bunch of layers, they're called, but I don't like this term.
If you've ever seen a soundboard, like a big mixing board, an audio console that audio engineers use, you'll usually see a lot of strips with a lot of little knobs, many knobs. That's what a machine learning model looks like. You've got an input layer, many hidden layers of knobs that you can turn. You turn these knobs and you tweak them so that the output is what you want. Backpropagation algorithmically says you do one pass through all your layers of probabilities, and then you arrive at an outcome.
Then the algorithm literally backpropagates, looks at the weights or the knobs, and says, I need to adjust these to get a higher confidence outcome. It's just a big loop, where you go and turn knobs a bunch of times until at the end, you get what is actually a machine learning model. Each knob, you may call a parameter. A model like GPT-4 today has 600 billion parameters. Think of it exactly that way, 600 billion little knobs you can turn to increase the probability of something useful. That's a lot of knobs. That's in essence, a machine learning model. It started in 1974 with this paper on backpropagation and deep learning. That was the way. For the next few decades, that's how it worked. There were many new architectures that used different configurations of these layers. There were things like recurrent neural networks, and GANs, Generative Adversarial Networks.
Fast forward to 2017, and a team of Google Brain scientists mostly published a paper, it's called, "Attention Is All You Need", that highlighted a new architecture for models, specifically language models. Architecture with machine learning models is nothing more than just a configuration of layers, you have this layer that does this, that layer, and you just arrange them in an order to get the output you want. This paper was foundational, this is the basis, the beginnings of ChatGPT. It was published in 2017. Look at those names, they're all Google, except one dude from the University of Toronto, and Illia, who was no attribution. This paper outlined a mechanism through code, where we can model attention very similar to human attention, where if you think about how we reason about language, we don't pay attention to single words.
Usually, if you're reading a book, if you're reading a blog post, if you're reading this, you don't focus on one word, and then think of the next, you focus on many words in parallel. That's exactly what they modeled here. It's called multi-headed attention, where in code, you model attention on one word and its surrounding words, they call them tokens, which is roughly one English word. They pay attention to these in code in parallel. This paper is 11 pages long. It's not very challenging to read. I'd encourage you to read it yourself. It's public. It's free online. This would set the basis for what would become ChatGPT.
The point I'm about to make here is very important, which is, from this paper, the implementation of it was a model called GPT-1, then GPT-2, and then GPT-3. All these models had great results, because this is a very good architecture. They released GPT-3, and nobody cared. ChatGPT blew up, not because of the model, but in 2022, OpenAI, they had this great model, it was doing good work, but nobody knew or cared. Until what? Until they put a chat user interface on top of it. Until they literally did some UI work, a little text box, a little UI and now we stream in some words. Then people started to care. This is a very important point that I'm making here, which is, you can innovate something great and wonderful, but if the user experience is just not there, nobody is going to care. That's what made ChatGPT successful. Sure, the model is great. The product UI and UX is ultimately what got them to 100 million plus weekly users. It's a very important point for us.
ChatGPT came out, 2022. Who used ChatGPT in 2022 for the first time? It blew up. I'm privileged and thankful to say I've been working in AI since before this. ChatGPT came out in 2022. Revolutionary, undoubtedly reached many millions of users very rapidly, but there were problems with it very early on. Specifically, there were three problems with it. What were those problems? Problem number one, hallucination. It would just make up nonsense.
For us, for some reason, we give it authority. When it makes up nonsense or hallucinates, we think, that must be right. I don't know what it is with us, but we see something and we think that must be true, unless it's very clearly false. Google tried rolling out AI Overviews. Some of you may have seen this, the memes of somebody typed into Google, how many cigarettes should a pregnant woman smoke every day? Just so you know, the answer is zero. I just want to be clear about that. The Google AI Overview said between three and five every day. Hallucination is a real problem. That's probably the biggest. Problem number two was knowledge cutoff.
The first model, GPT-3 finished its training. What does finished its training even mean? It just means the bunch of knobs and weights were sealed. The model was ready in October of 2021-ish. From there, the model is not going to be retrained. It takes a lot of compute and a lot of resources and a lot of GPUs to do this. They have a trained model up until a certain date, and then its knowledge is just frozen there. If you ask a question like, what movie should I go see today? It will say, I can't help you. It's a problem. There is no access to real-time information. If you want to use this in your company, and you have new users who signed up and you're maybe doing a prompt like, what ads should I show my new users based on their preferences? The model will say, no, I was trained in 2021. I don't even know who you are. Useless.
Then the third problem was the problem of limited context. These models have, you can say memory. That's an oversimplification, no doubt. They can hold in context a finite number of tokens. A token is roughly one English word. Today, most context windows are like 100,000 to 200,000 rough English words. Any more than that, and you run into problems. The model will just start over as if it's a new conversation. There are some models with very large context windows. Gemini, Google's models, you could actually put the entire Harry Potter book series into Gemini. That's actually cool, but it's rare. Finite context is still a problem. We have three big problems. Problem number one is hallucination. Number two is knowledge cutoff. Number three is finite context.
I'd like to actually show you these problems so we can explore together how you solve them, and you can solve them. The solution from 2022 is still the solution today. I think this is very important to spend some time demoing. Someone's notebook is here. Let's read it. It's a list of upcoming talks. I'm going to use a tool called Langflow. Langflow is free, it's open source. This is an open-source tool. There's no marketing or money to be made. It's great for diagramming though. I'm going to show you this problem. I'm going to make a flow. It's visually a great way to understand how these things work. I have a chat input and a chat output. You can think of this as like a chat user interface. I'm going to get a language model from OpenAI, and I'm going to choose an old model, GPT-3.5.
Now I'm going to connect the chat input to the chat output right here. That's basically it. We have a chatbot now that uses OpenAI's older model. I'm going to ask something real time, like, what movies are playing in the theater today? As you can expect, "I'm sorry, but I'm unable to provide real time information". This is the classic experience of early ChatGPT. This doesn't know, there's also knowledge cutoff and so on. How do we solve this? We solve this with a very popular technique that's not cool anymore because we're still in yesterday, it's called RAG. Anyone heard of RAG? It's not what I'm wearing. It stands for Retrieval-Augmented Generation. Retrieval, meaning you retrieve accurate, up-to-date information. You get it. You do like fetch, you do a network request, you get the information and you give it to the language model. You get the answer, put it in the prompt. It's called RAG.
Let's look at how RAG actually works here. I'm just going to zoom out. We'll implement RAG here. What we need is we need to retrieve. We need to get movies somehow from the internet. Let's assume we have this information available because we do. I'm going to leave the chat input right there, but I need to get movies somehow. If I go to Rotten Tomatoes, these are a list of movies in the theater. Fantastic. This is my big database. I'm going to get this data. I'm going to get a URL component here, and I'm going to paste this. What this will do is it will get me the text content of this website. This is equivalent to getting some stuff out of your database. Now we have the answer. We have the question from the user. I'm going to do some prompt engineering. This is literally what that term means. I have a prompt template and here I'm going to say, this is the question from the user, and I'll do question. Here are some data from my database, data. You just have these two things, a question and data.
The question goes here, the data goes here, and we give this prompt to the language model. Notice the language model itself has not changed. What has changed is the way I build my app. I just get data from my database and so on. Let's now go try that same prompt here in the playground. What movies are playing in the theater today? The language model is still GPT-3, but now it just like that knows stuff. This is RAG and this is how we got around RAG a lot of times. This is how companies still do it today. It's a very good thing. There's a movie titled, "If I Had Legs, I'd Kick You". I don't even know what that's about. Is that true? It's this one, 95. Anyway. Maybe the movie is AI generated. This is working.
If you want to see RAG in the wild today, where can you go look? I have news for you. I don't know if you've noticed this, ChatGPT itself performs RAG. It's a first-class technique when working with AI. In fact, if you don't believe me let's go to chatgpt.com, and I'd ask the same prompt. Check it out. It's going to search the web. It's literally doing that Rotten Tomatoes thing. There you go. It depends on what you mean, but in Munich these are the movies. There's no, If I Had Legs, I'd Kick You, but that's RAG. Literally, it shows you the sources from where it retrieved it. RAG is the de facto standard technique. It works great. That's how you use it, if you live in 2024.
The year 2024 was the year of RAG. Everyone was performing RAG. They were performing RAG this way to get real-time information. This solves two of three problems. It solves problem number one, which is hallucinations. As you can see, there's no hallucinations here. Another way to remove hallucinations is to lower the temperature of your model. If you go to the controls of your language model, you see temperature right here. Something like 0.1 is a very good value. This rule applies to humans and LLMs. The higher the temperature, the more you hallucinate. It's true. You ever had a fever and you start seeing some stuff. It's the same. You want to keep the temperature low and this is how you do RAG. That solves two of three problems. It solves hallucinations. No doubt the movies are not hallucinated. It solves the knowledge cutoff. It got movies from 2025. This model was finished in 2021, something like that. What about context windows? How do we solve that? Another solution for this is you would use something called vector search.
Anyone familiar with vector search? Vectors. It's like a search engine, but it searches not on keyword density, but instead on semantic meaning. It associates words like animal and pet with dog, even though the word dog may not match. Similarity search allows you to store little pockets of text in your vector database, and based on a user's incoming query, you find the closest linguistic result to what the user wants, and you put that in your prompt. That really helps long context because you only get relevant context, and you're selective with what context you put in your context window. I'm using the word context a lot, but that's how it works. RAG was a great solution and it worked well, almost every single time. This was 2024. We started in 1906, went to 1974, 1980, 2017 with the transformers. We're now in 2024.
AI, Today
Let's talk about today. What is 2025 characterized by in the world of AI? Anyone been to San Francisco? If you go to San Francisco, people start conversations by saying 2025 is the year of agents. This actually is a thing. You go sit down to have coffee with someone, and before they say hi, they're like, 2025 is the year of agents. It's a different culture. That's a characteristic of 2025. 2025 is the year of AI agents. What are agents? Now we opened a can of worms. We have to talk about agents and we have to talk about agents in detail. That's the only way we talk about things. This is today, 2025. Agents, what are agents?
Specifically, what is an agent? It's very important to reason from first principles. That's how we all go there together. I'm not some guy saying big words to you. An agent is, one, a person, an entity that has agency. That's what it is. Then, what is agency? Agency classically defined is a person or thing, meaning a computer, so organic or artificial, through which power is exerted or an end is achieved. Meaning if you're a human being here and you can make a choice, I'm choosing to come on stage and talk to you. That's me using my agency. I'm doing effort to do a thing for you today. Human agency is the ability to make decisions to do tasks. Artificial intelligence agency or AI agents is the same thing. It's just in code. It's the ability to make choices, to do tasks. Let's walk through a really practical version of what agency actually looks like.
If you come to me and you say, Tejas, multiply the first 10 prime numbers. I would look at you and say, I don't even know what a prime number is. I would say, I don't know how to do that, but I have a language model in my brain, literally to parse language. We all do. It's called your name GPT. No. I have a language model in my brain. You're going to tell me to do some complex arithmetic, and the language model in my brain is going to say, you can't do that, but there's a tool you can use based on its description that was taught to you in fourth grade math class. Some of you, this may be resonating, the tool is called a calculator. I learned how to use this tool.
More importantly, I went to school and I learned it's description, use this for arithmetic. Now I know, there's a tool. What's going to happen is I'm going to pull up the tool with my agency. You understand my agency is going to give me this tool. I'm going to use this calculator. Then the language model in my mind is going to generate the inputs to the tool. You say, multiply the first 10 prime numbers. Then the language model in my mind generates, 2 times 3 times 7. The inputs come from my brain. Then I do the final thing, which is I hit the equal button, and then the tool gives me an output. It shows me the value and then it goes straight to my language model. Then I tell you the answer. This whole thing that I identified is literally what AI agents are and how they work, but in code.
Let me show you a demo of agency. You said multiply the first whatever prime number. This is the demo. Check it out, so 2 times 3 times 7 times 11. There, that's me using agency. How then do we do this but with AI? How do we do this artificially? Let's take a look. If we come back to Langflow. I'm using Langflow again to show you the flow of things. That's the point. We're going to get rid of all of this. We're going to perform RAG, but this is going to be agentic RAG. What's the difference between classic RAG and agentic RAG? In agentic RAG, the machine makes the decisions. I don't make the decisions. I'm going to delete this prompt template. I'm going to leave my chat input, and the URL I'm going to turn into a tool that the agent can use. I turn on tool mode. We're going to get rid of this language model.
Instead of a language model, we're going to actually use an agent. You may ask, what is an agent? An agent is a language model plus, plus. You're a language model, but you have more. That's exactly what an agent is. We'll give this tool the URL tool to the agent. We give the chat input to the agent. Just like that, we've got a cool agent that can browse the internet. Let's try this again. We're going to check for RAG. Let's choose a better model. This is not going to work for reasons I will tell you.
If I send this prompt now, you can watch the agent use its agency. This is that moment where you go like, that's so cool. This is not working because I didn't plug in the chat output, so you didn't see any output. Let's try that again, and we'll send this. Now you can see the agent, it got the current date and it said to find out, go check the theater's websites. It didn't do this for a few reasons. One, it didn't know what tool to use. The description of this tool is not clear enough for the language model to understand. I'll say, use this tool to get movie listings from Rotten Tomatoes. I can be as prescriptive as I want in the description, but now I've clarified what the tool does, and so I can do it again. Now, because of the language model connecting the agent to the tool, suddenly it knows, look at this, it's accessing fetch content.
Going to Rotten Tomatoes, because I described the tool, and it performed agentic RAG. You see that? The cool thing about this is, I didn't tell it where to get the data from. This whole thing was generated by the agent itself. It knew to go to this URL. I didn't copy paste this. This was the agent thinking. It was literally agency choosing which website to go to. In fact, we can do better. We can say, we'll just change the description, use this tool to browse the internet for answers. How cool is that? Now we're getting more general. Now we can do something like, when is Tejas' talk at InfoQ Dev Summit 2025, let's do Munich, just for safety? Now the agent, with its agency, can go browse the internet, fetch content, and it got a 404 page. That's unfortunate. Why is this a 404?
Let's go back to that. Let's actually use a different tool. Because the URL tool allows the LLM to think of some URL. The URL may not exist, this feedback for you, but instead of thinking about a URL, wouldn't it make more sense to just think about a search query? Because URLs are hit or miss. Let's do this. Let's go web search. This makes more sense. We'll do tool mode. We'll give this tool. Let's just check that tool description. Performs a basic DuckDuckGo search. Great. That's actually what I want. Let's try that again. This time, different tool, but same task. Now it's using the different tool, performing RAG, and in a couple seconds, you'll decide how you're going to rate this talk. That's what's going to happen. Here we go. It searched the web. This is the search query. You can inspect. It found these results. Good start. Now let's check.
"Tejas Tejas Kumar's talk is scheduled for Thursday at 9 a.m., CEST, and the talk is titled". It did that. The agent used RAG itself. This is where we are in 2025. Agents just have a mind of their own. They can do things. Let's actually do the multiplication thing as well because I feel like it's parallel. There's a calculator here. Calculator. You use a tool mode, and you give the tool. Agents can use as many tools as you want. We'll now do, multiply the first 10 prime numbers. It'll just do the job. It's using evaluate_expression. It did that. It got some $6 billion something. It did it by literally generating the inputs and reading the output, as I did. It's me, but fake.
Or another way of doing it, it's a human agent, but artificial. This is how AI agents work. 2025, this year, was characterized by AI agents. Everyone's doing agents always everywhere to the point where it's irritating. There's a new coding agent every week. If you want to know which is the best one, use Cursor. I just think it's a tremendous tool. I've actually built a similar tool, so I understand the complexity. Cursor is just a class of its own. This is agency. This entire year, AI in 2025, has been about agents. There are limitations with agents as well, specifically around long-running tasks. The biggest problem today with agents is context management, actually. When you reach the limits of a model's context, even with the vector store or whatever it may be, an agent can't perform tasks that take hours and hours, multi-step tasks, yet. That's where we're going.
This is the state today. This is agentic RAG, and agents in general. The cool thing about agents is a language model is just a piece of the puzzle. Let's explore how we can make this even more practical to enable an agent to actually not just read, but also write, to do stuff for me. I suck at managing a calendar. Let me show you an agent in its prime, not just that reads, but also writes. I built this, I actually use this myself, an agent to manage my calendar. If I pull up Google Calendar, and I turn on tool mode. I'm just going to go here, do this. I can choose what it can do or can't do on my calendar. Should I allow it to delete events? I don't know. I'm just going to say, do everything, it's fine. Sometimes you need to run before you can walk. This is my calendar, welcome. What I'm going to do is I'm going to open my calendar here in split view. We'll do that. I'm going to now come here, and it can do this on my calendar. What I'm going to do is I can say, make a lunch appointment for me today at 1 p.m., Europe Berlin time. Some models struggle with dates. I'll say today, October 16th, 2025, just in case.
If you're building an application, you can inject this in your prompt yourself. This looks good. I'm going to send this off. I'm going to now open my calendar. Hopefully I open it fast enough so we can see that somewhere here it should appear, open in split view. It's creating a Google Calendar event. "Your lunch appointment has successfully been scheduled at 1 p.m., in the Europe Berlin". It did it. We can see the event here. Here we go, so lunch appointment. You see that? It totally did it. This didn't exist before. You saw my calendar. It did it. It decided I need one and a half hours for lunch. That's the agent working. You might ask, what if I want to use this, but I don't want to use it in this Langflow UI? You can't build a business and expect your customers to go use Langflow. No, you want them to use your application.
Ideally, you want them to use no application, just to do a job. How can we do that? Langflow, thankfully, is cool, because this diagram that I drew here with my calendar and stuff, I can expose this whole thing over HTTP. This whole flow now becomes an API, and it's open source. It's running on localhost. It's a Docker container. I can put it wherever I want. If I do a network fetch to this endpoint, I will just trigger this entire flow and give the output to a user. Any frontend can just talk to this as-is. This would make this calendar agent accessible from any user interface that you build for your company or whatever.
AI, in the Future
I want to talk about the future, 2026 and beyond, where I believe the web and software in general is going, and I think it's important to prepare us all for that. I think people are not going to care about your product UI. That's probably the best way to say it. There's a new standard in the world of AI called MCP, Model Context Protocol. Anyone familiar with MCP? Some would say it's too early, it's too experimental. It's not. Stripe is using it. Neon is using it. Databricks is using it, Microsoft. There's a rumor that the next version of macOS at the OS level will have MCP support. Believe me, it's not wild west. It's not too early. MCP allows you to use agentic tools from anywhere without any UI. What if I wanted to use this calendar agent in ChatGPT? Can I do that? I can over MCP. How MCP works is it's a standard server client protocol, like HTTP. Let's talk about HTTP, actually. A browser. A browser is an HTTP client. Your website has an HTTP server.
The browser says, server, at this domain, give me the webpage, and the server gives it. It's a client server. MCP works exactly the same way, or an MCP client, like ChatGPT. ChatGPT is an MCP client. An MCP client opens up, registers with an external MCP server, and says, MCP server, what context do you have for me? Model Context Protocol. What context do you have for me? Context can be tools, like the calendar tool, the calculator tool, the web search tool. It can be tools. Or it can be not tools. It can be prompts. It can be conversation history. It can be a number of things. What context do you have for me? What you can do with this is you can share tools, you can extend the capabilities of ChatGPT with your own custom tools, like with your calendar manager. This means you never actually have to leave ChatGPT to do work.
Let me show you this. I think practically this makes more sense. Let's use my calendar thing, but not use it via Langflow and not use it via some web interface, but use it over MCP. Langflow exposes also an MCP server. I'm going to come in here, and instead of API, I'm going to share this as an MCP server. Now I can edit my tools. This doesn't make any sense, this description. I'm going to say, use this tool to manage my calendar. Calendar, like that. I'll give it the name calendar, calendar tool. Now I can expose this flow over MCP and quickly add it to any of these MCP clients. These are all MCP clients. I'll add it maybe to Claude by clicking here. I'm going to quit Claude, I'm going to open it again. We have Claude. Claude is exactly like ChatGPT. It's just a competitor. Look at this, the user interface is kind of the same.
If I open Claude and I go to my tools, you can see lf starter_project. That's the thing that I just made, Langflow starter project. Now I can say, what's on my calendar today? Send it off. Now I'm not using Langflow, but I'm just using some generic MCP client, something like Claude. Check it out. It's using the calendar tool that I just made over MCP because when I opened Claude, it said, MCP server, what tools do you have? It gave it the calendar tool. Factory Claude has its capabilities extended and it has these items on your calendar today. Indeed. There's the lunch appointment that I just built, for 90 minutes. You see that? Claude can do this on my behalf.
Let's talk about UX. Previously, if I wanted to manage my calendar, I would have to go on some type of calendar app, Google Calendar, and then I have to look, first of all, but there's cookie banners. There's Google saying your Google Workplace is about to expire. There are all kinds of notifications. Then I reach there, but then I have no mouse, but I need to create a new event. How do you create a calendar? You click and drag. If we're here on Google Calendar, check out this UX. Fantastic UX. It's so great. If I want to make an event, I do this. You see that? What if I have no mouse? That's hard. Do I go here? Maybe. What if the page doesn't have JavaScript? What if this screen reader is not accessible? The web is just a combination of decisions that were made, sometimes bad ones. I say this as a web engineer. We build things like carousels that are not really accessible. We build accordions. The UI is challenging.
Traditionally I would have to go use some web interface and click on things that maybe I can't click on, click on things that maybe need JavaScript. What if I'm in a country with poor network connectivity? The web is hard. This is how it used to be. How do we fix this? Maybe some new company comes along, invests millions in a great UI team, and they do great UI work. What if we don't need UI at all? Wouldn't that be something? You see that? That's exactly what I mean. What if we didn't need any of this, but instead of all of this, I could just from here do everything I need? Do you understand? What if this is my home, and I never need to leave my house and go venture across the wild wide web, and accessibility and cookie banners and all.
What if I didn't need to navigate that, but I just spoke and my calendar was managed, my email was managed? I could buy stuff on Amazon by just saying, buy it here. What if that was the future? I think that is the future. That's literally what companies like OpenAI are going for. They want ChatGPT to replace the browser and to replace your web browsing experience. They want you to live inside ChatGPT. Again, to just make sure this metaphor lands, how it works pre-AI, pre-MCP, is we open a browser and we leave our house. We go to amazon.com. We go to Zillow. We go to these other websites, we leave our house, and the websites have different standards for accessibility.
Some of them don't work, there's cookie banners, whatever. What if we never needed to leave our house? Everything is in one app, which is Claude or ChatGPT or some MCP client. That's the future. Because here's the deal, what if we could even use a computer through this? In fact, Cursor just released support for this, which I think is very awesome. In Cursor agent, you can just open a browser and say, go on amazon.de and find clothes. Cursor is also an MCP client. That's what it's going to do now. It's literally navigating to this website in a browser and telling you what it sees. I don't really need to go do this myself. The benefit here is that I can just speak things into existence. Here, I see the amazon.de homepage is successfully loaded. Now I need to accept the cookies dialogue.
The best thing about this is I have an agent to do all this. As you can see, those three dots on the bottom, it's generating. Then the question becomes, what do I do while it generates? It accepted the cookies. What do I do while my agent does my shopping? What do I do while my agent books my aircraft tickets? What do I do while my agent manages my calendar? The answer is I live my life. I don't spend hours looking through emails only to delete spam after spam after spam. I don't manage a calendar and do all that. No, I actually have a family that I like. I have friends. I like doing sports. A lot of us live today wanting to get more time back. I think that's true. I think a lot of people, we spend a lot of time doing nonsense. I have to get my boarding pass for my flight out of Munich, how cool would it be if an agent did it for me while I'm here talking to you? In fact, that's exactly what happened. I have a team of agents working for me all the time. It's great. I have so much time to do the things I actually want. This is the vision I have for all of us. That's where indeed I believe AI is going in 2026 plus.
OpenAI recently had a DevDay where they showcased some of this, where they also agree with me that they want the web to go. Consolidating on a platform like OpenAI is not where I want it to go. I want this to be distributed. This is what they released in case you missed it. I think this is very important to show you. This is their DevDay YouTube event. There's this really awesome demo this guy did around 11 minutes. This is so cool. What he's doing is he's at mentioning Zillow, which is a home search platform for property. What this does is it connects to Zillow, and it embeds this actual interactive map of properties inside ChatGPT. This is ChatGPT. Then you can find your apartment, find your houses. You can expand the map fully. When you click on a house, you can expand the map and scroll, scroll, scroll.
Then, finally, you can say, I would like to buy this one or I would like to rent this one. All this experience doesn't happen on some separate website with its own cookie banners. It happens inside ChatGPT, or Claude, or any MCP client, Cursor, Windsurf, whatever. MCP allows this beautiful experience that I think is, for me, a lot better than varied experiences across the web. You can actually build this today. There's an SDK to get your apps into ChatGPT that I think is worth taking advantage of. In fact, if you have something customer or external facing and you're not here, you may be missing customers. This talk started about UX. We talked about GPT-1, GPT-2, GPT-3, and how nobody cared until there was a chat UI on top of it. That's where we started. We'll finish with UX. UX is changing. UX used to be, go to a bunch of websites. It's now just prompt things into existence. What does that look like for you? I think it's worth considering. This is indeed where AI is going in the future.
Next Steps
Let's talk about next steps. The tool that I use to build those workflows and then expose MCP servers and APIs, it's called Langflow. It's free. It's open source. There's no money to be made from it. I think it's just a great tool. If you want to check it out, it's langflow.org/desktop. That QR code will take you there as well.
See more presentations with transcripts