InfoQ Homepage Podcasts Shreya Rajpal on Guardrails for Large Language Models

Shreya Rajpal on Guardrails for Large Language Models

Jan 08, 2024

Live from the venue of the QCon San Francisco Conference, we are talking with Shreya Rajpal, CEO and co-founder of Guardrails AI. In this podcast, Rajpal shares her insights on building guardrails for large language model (LLM) applications. Rajpal discusses how Guardrails AI assesses the reliability and safety of LLM applications, ensuring any input sent to the model is functionally correct and providing a framework for developers to create their own custom validators.

Key Takeaways

Guardrails AI provides a framework for ensuring the reliability and safety of applications built on top of large language models (LLMs).
Guardrails checks for correctness and safety, based on various predefined criteria. These could range from making sure the LLM does not hallucinate to ensuring that the generated text does not contain profanity.
Guardrails AI uses a combination of code, machine learning models and external APIs to enforce these correctness criteria.
The framework is model-agnostic; it can work with both open-source and commercial models and even a random string generator.
It offers the functionalities of custom validators, creating 'guards' to ensure the LLM abides by certain conditions. Users have the flexibility to set policies around these guards.

Subscribe on:

Transcript

Introduction

Roland Meertens: Welcome everyone to The InfoQ Podcast. My name is Roland Meertens and I'm your host for today. I am interviewing Shreya Rajpal, who is the CEO and Co-founder of Guardrails AI. We are talking to each other in person at the QCon San Francisco conference just after she gave the presentation called Building Guardrails for Enterprise AI Applications with large Language Models. Keep an eye on InfoQ.com for her presentation as it contains many insights into how one can add guardrails to your large language model application so you can actually make them work. During today's interview, we will dive deeper into how this works and I hope you enjoy it and you can learn from it.

Welcome, Shreya, to The InfoQ Podcast. We are here at QCon in San Francisco. How do you enjoy the conference so far?

Shreya Rajpal: Yeah, it's been a blast. Thanks for doing the podcast. I've really enjoyed the conference. I was also here last year and I had just a lot of fantastic conversations. I was really looking forward to it and I think it holds up to the standard.

Roland Meertens: All right, and you just gave your talk. How did it go? What was your talk about?

Shreya Rajpal: I think it was a pretty good talk. The audience was very engaged. I got a lot of questions at the end and they were very pertinent questions, so I enjoyed the engagement with the audience. The topic of my talk was on guardrails or the concept of building guardrails for large language model applications, especially from the lens of this open-source framework I created, which is also called Guardrails AI.

What is Guardrails AI [02:21]

Roland Meertens: What does Guardrails AI, what does it do? How can it help me out?

Shreya Rajpal: Guardrails AI essentially looks to solve the problem of reliability and safety for a large language model applications. So if you've worked with generative AI and built applications on top of generative AI, what you'll often end up finding is that they're really flexible and they're really functional, but they're not always useful primarily because they're not always as reliable. So I like comparing them with traditional software APIs. So traditional software APIs tend to have a lot of correctness baked into the API because we're in a framework or we're in a world that's very deterministic. Compared to that, generative AI ends up being very, very performant, but ends up being essentially not as rigorous in terms of correctness criteria. So hallucinations, for example, are a common issue that we see.

So this is the problem that Guardrails AI tends to solve. So it essentially is something that acts like a firewall around your LLM APIs and make sure that any input that you send to the LLM or any output that you could receive from the LLM is functionally correct for whatever correctness might mean for you. Maybe that's not hallucinating and then it'll check for not hallucinations. Maybe it means not having any profanity in your generated text because if you know who your audience is and it'll check for that. Maybe it means getting the right structured outputs. And all of those can be basically correctness criteria that are enforced.

Roland Meertens: If I, for example, ask it for a JSON document, you will guarantee me that I get correct JSON, but I assume that it can't really check any of the content, right?

Shreya Rajpal: Oh, it does. Yeah. I think JSON correctness is something that we do and something that we do well. But in addition to that, that is how I look at it, that's kind of like table states, but it can also look at each field of the JSON and make sure that's correct. Even if you're not generating JSON and you're generating string output. So let's say you have a question answering chatbot and you want to make sure that the string response that you get from your LLM is not hallucinated or doesn't violate any rules or regulations of wherever you are, those are also functional things that can be checked and enforced.

Interfacing with your LLM [04:28]

Roland Meertens: So this is basically then like an API interface on top of the large language model?

Shreya Rajpal: I like to think of it as kind of like a shell around the LLM. So it kind of acts as a sentinel at the input of the LLM, at the output of the LLM and acts as making sure that there's no dangerous outputs or unreliable outputs or unsecure outputs, essentially.

Roland Meertens: Nice. And is this something which you then solve with few-shot learning, or how do you then ensure its correctness?

Shreya Rajpal: In practice, how we end up doing it is a bunch of different techniques depending on the problem that we solve. So for example, for JSON correctness, et cetera, we essentially look to see, okay, here's our expected structure, here's what's incorrect, and you can solve it by few-shot prompting to get the right JSON output. But depending on what the problem is, we end up using different sets of techniques. So for example, a key abstraction in our framework is this idea of a validator where a validator basically checks for a specific requirement, and you can combine all of these validators together in a guard, and that guard will basically run alongside your LLM API and make sure that there's those guarantees that we care about. And our framework is both a template for creating your own custom validators and orchestrating them via the orchestration layer that we provide, as well as a library of many, many commonly used validators across a bunch of use cases.

Some of them may be rules-based validators. So for example, we have one that makes sure that any regex pattern that you provide, you can make sure that the fields in your JSON or any string output that you get from your JSON matches that regex. We have this one that I talked about in my talk, which you can check out on InfoQ.com called Provenance. And Provenance is essentially making sure that every LLM utterance has some grounding in a source of truth that you know to be true, right? So let's say you're an organization that is building a chatbot. You can make sure that your chatbot only answers from the documents from your help center documents or from the documents that you know to be true, and you provide the chatbot and not from its own world model of the internet that it was trained on.

So Provenance looks at every utterance that the LLM has and checks to see where did it come from in my document and makes sure that it's correct. And if it's not correct, that means it was hallucinated and can be filtered out. So we have different versions of them and they use various different machine learning techniques under the hood. The simplest one basically uses embedding similarity. We have more complex ones that use LLM self-evaluation or NLI-based classification, like a natural language inference. And so depending on what the problem is, we use either code or ML models or we use external APIs to make sure that the output that you get is correct.

Roland Meertens: Just for my understanding, where do you build in these guardrails? Is this something you built into the models or do you fine tune the model, or is this something you built into essentially the beam search for the output where you say, oh, but if you generate this, this path can't be correct? Do you do it at this level? Or do you just take the already generated whole text by your large language model and you then, in hindsight, kind of post-process it?

Shreya Rajpal: The latter. Our core assumption is that we're very... We abstract out the model completely. So you can use an open source model, you can use a commercial model. The example I like using is that in the extreme, you can use a random string generator and we'll check that random string generator for profanity or making sure that it matches a regex pattern or something.

Roland Meertens: There's like worst large language model.

Shreya Rajpal: Exactly, the worst large language model. I guess it was a decision that allows developers to really be flexible and focus on more application level concerns rather than really wrangling their model itself. And so how we end up operating is that we are kind of like a sidecar that runs along your model. So any prompt that you're sending over to your LLM can first pass through guardrails, check to see if there's any safety concerns, et cetera. And then the output that comes back from your LLM before being sent to your application passes through guardrails.

What constitutes an error? [08:39]

Roland Meertens: So are there any trade-offs when you're building in these guardrails? Are there some people who say, "Oh, but I like some of the errors?"

Shreya Rajpal: That's an interesting question. I once remember chatting with someone who was like, oh, yeah, they were building an LLM application that was used by a lot of people, and they basically said, "No, actually people like using us because our system does have profanity, and it does have a lot of things that for other commercial models are filtered out via their moderation APIs." And so there is an audience for that as well. In that case, what we end up typically seeing is that correctness means different things to different people. So for the person that I mentioned for whom profanity was a good thing, the correct response for them is a response that contains profanity. So you can essentially configure each of these to work for you. There's no universal definition of what correctness is, just as there's no universal use case, et cetera.

Roland Meertens: Have you already seen any applications where your guardrails have added a significant impact to the application?

Shreya Rajpal: I think some of my most exciting applications are either in chatbots or in structured data extraction. I also think that those are where most of the LLM applications today are around. So if you're doing structured data extraction, which is you're taking a whole chunk of unstructured data, and then from that unstructured data you're generating some table or something, you're generating a JSON payload that can then go into your data warehouses, like a row of data. So in that case, essentially making sure that the data you extract is correct and uses the right context and doesn't veer too far off from historically the data that you receive from that. I think that's a common use case.

I think the other one that I've seen is you're building a chatbot and you care about some concerns in that chatbot. For example, if you're in a regulated industry, making sure that there's no rules that are violated, like misleading your customer about some feature of your product, et cetera, brand risk, using the right tone of voice that aligns with your brand's communication requirements. I think that's another common one. Checking for bias, et cetera, is another common one. So there tend to be a lot of these very diverse set of correctness criteria that people have with chatbots that we can enforce.

Enforcing bias [10:55]

Roland Meertens: So how do you enforce these things, for example, bias? Because I think that's something which is quite hard to grasp, especially if you have only one sample instead of seeing a large overview of samples.

Shreya Rajpal: I think this is another one of those things where, depending on the application or the use case, different organizations may have different Desiderata. So for example, one of the things you can check for is essentially gendered language. Are you using very gendered language or are you using gender-neutral language when you need to be using in your press briefs, et cetera. So that is one specific way of checking bias. But our core philosophy is to take these requirements and break them down into these smaller chunks that can then be configured and put together.

Roland Meertens: I just remembered that if you have Google photos, they at some point had this incident where someone put in gorillas and then found images of people, I think they just stopped using this keyword at all, which is quite interesting.

Any other applications where you already saw a significant impact or do you have any concrete examples?

Shreya Rajpal: Yeah, let's see. If you go to our open source GitHub page, I think there's about a hundred or so projects that use Guardrails for enforcing their guarantees. I want to say most of them are around chatbots or structured data extraction. I see a lot of resume screening ones. I see a lot of making sure that you're able to go to someone's LinkedIn profile or look at someone's resume and make sure that they're the right candidate for you by looking for specific keywords and how are those keywords projected onto a resume. So I think that's a common one. Yeah, I think those are some of the top of mind ones. Help center support, chatbots are another common use case. Analyzing contracts, et cetera. Using LLMs, I think is another one.

Roland Meertens: These are sound like applications where you absolutely need to be sure that whatever you put there is-

Shreya Rajpal: Is correct. Utah.

Roland Meertens: ... is very correct. Yes.

Shreya Rajpal: Absolutely.

Roland Meertens: So what kind of questions did you get after the talk? Who were interested in this? What kind of questions were there?

Shreya Rajpal: The audience was pretty excited about a lot of the content. I think one of my favorite questions was around the cost of implementing Guardrails, right? At the end of the day, there's no free lunch. This is all compute that needs to happen at runtime and make sure that you're, at runtime, looking at where your risk areas are off your system and safeguarding against those risk areas, which typically requires add some amount of latency, add some amount of cost, et cetera as well. And so I think that was an interesting question about how do we think about the cost of implementing that?

I think we've done a bunch of work in making the guardrails configurable enough where you can set a policy on each guardrail to make sure that it's a policy that allows you to say how much you care about something. Not every guardrail is pull the alarm, there's a horrible outcome. Some of them are bad, but you just shrug and move on. Some of them are like, you take some programmatic action, some of them you do more aggressive risk mitigation, and so that is configurable, and we did a bunch of investment making sure that they're low latency, they can be parallelized very easily, et cetera.

Priorities for content correctness [13:59]

Roland Meertens: So for example, I could say I absolutely want my output to be the right API specification, but it's okay if one of the categories didn't exist before, or isn't in my prompt?

Shreya Rajpal: Absolutely. Yeah, that's exactly right. I think a classic example I like using is that if you're in healthcare and you're building a healthcare support chatbot, you do not have the authorization to give medical advice to anyone who comes on. And so that's a guardrail where the no medical advice guardrail, where you'd much rather be like, oh, I might as well not respond to this customer and let a human come in if I suspect that there's medical advice in my output. So that's a guardrail where you either get it right or it's not useful to your customer at all, right? So that's one of the ones where even if it's slightly more expensive, you're willing to take that on. A lot of the other ones you can, like you said, if there's some extra fields, et cetera, that you're typically okay with.

Roland Meertens: So what are the next steps then for Guardrails AI? What kind of things are you thinking about for the future? Do you get some requests all the time?

Shreya Rajpal: I think a common request that we get is, I think this is much less a capability thing and more just make it easy for our users to use it where we have support for a lot of the common models, but we keep getting requests every day for support Bard or support Anthropic, et cetera. So we have a custom, like I said, a string-to-string translator where you can substitute your favorite model and use whichever one you one. But I think that's a common one where just add more integrations with other models that are out There.

Roland Meertens: Is there a winning model at the moment which everybody is going for?

Shreya Rajpal: I think OpenAI typically is the one that we see most commonly. Yeah. I think some of the other ones are more around specific features with being able to create custom guardrails with lower input involved. So like I mentioned, we have a framework for creating custom guardrails, but they're like, okay, how do I make it easier to see what's happening? I think better logging in visibility is another one. So a lot of exciting changes. I think a few weeks ago we released a big 0.2 release, which had a lot of these changes kind of implemented in addition to a lot of stability improvements, et cetera, and you have more releases to come.

Roland Meertens: And so for the fixing the errors, is this just always hand coded rules or could you also send it back to a large language model and say, oh, we got this issue, try it again, fix this?

Shreya Rajpal: Yeah, so that's what we like to call the re-asking paradigm that we implemented. So that actually was a core design principle behind Guardrails where these models have this very fascinating ability to self-heal. If you tell them why they're wrong, they're often able to incorporate that feedback and correct themselves. So Guardrails basically automatically constructs a prompt for you and then sends it back and then runs verification, et cetera, all over again. This is another one of those things that I walked over in my talk, which is available for viewers as well.

Fixing your output errors [16:48]

Roland Meertens: So then do you just take the existing output and then send the output back and say, "This was wrong, fix it?" Or do you just re-ask the question and hope that it gets it correct the next time?

Shreya Rajpal: That's a great question. So typically we work on the output level. We've done some prompt engineering on our end to configure how to create this prompt to get the most likely correct output. So we include the original request, we include the output. On the output, we do some optimization where we only, and this is configurable as well, where you only re-ask the incorrect parts. So often you'll end up finding there's a specific localized area, either like some field in the JSON, or if you have a large string or a paragraph or something, some sentences in a paragraph that are incorrect. So you only send those back for re-asking and not the whole thing, and that ends up being a little bit less expensive.

Roland Meertens: Okay. Oh, interesting. So you only queried the things which you know are wrong?

Shreya Rajpal: Right, right.

Tips to improve LLM output [17:42]

Roland Meertens: Ah, smart. Yeah, that must save a lot of money. And then in terms of correctness and safety, do you have any tips for people who are writing prompts such that you can structure them better? Or how do you normally evaluate whether a prompt is correct?

Shreya Rajpal: I think my response is that I kind of disagree with the premise of the question a little bit. I actually, I go over this in my talk, but what you end up finding a lot of times is that people invest a lot of time and energy in prompt engineering, but at the end of the day, prompts aren't guarantees, right? First of all, the elements are non-deterministic. So even if you have the best prompt figured out, you send that same prompt over 10 different times, then you're going to see different outputs. You're not going to get the right output.

I think the second is that prompt isn't a guarantee. Maybe you're like, okay, this is what I want from you. This is the prompt communicating with the LLM. This is what I want from you. Make sure you're not violating XYZ criteria, et cetera. There's absolutely nothing guaranteeing that the LLM is going to respect those instructions in the prompt, so you end up getting incorrect responses still. So what we say as safer prompts, yes, definitely prompt is a way to prime the LLM for being more correct than normal. So you can still definitely include those instructions that don't do XYZ, but verify. Make sure that actually, those conditions are being respected, otherwise you're opening yourself up to a world of pain.

Roland Meertens: I always find it really cute if people just put things in there like, "Oh, you're a very nice agent. You always give the correct answer." Ah, that will help it.

Shreya Rajpal: One of my favorite anecdotes here is from a friend of mine actually who works with LLMs and has been doing that for a few years now, which is a few years ahead of a lot of other people getting into the area, and I think one of her prompts was, a man will die if you don't respect this constraint, which is a way who wrangled LLM to get the right out output. So people do all sorts of weird things, but our key thing, I think she ended up moving onto this verification system as well. I think at the end of the day, you need to make sure that those conditions you care about are respected and prompting is just clean and sufficient.

Roland Meertens: I guess that's the lesson we learned today is always tell your LLM that someone will die if they get the answer incorrect.

Shreya Rajpal: Absolutely.

Roland Meertens: Yeah. Interesting. All right. Thank you very much for being on the podcast and hope you enjoy QCon.

Shreya Rajpal: Yeah, absolutely. Thank you for inviting me. Yeah, excited to be here.

Roland Meertens: Thank you very much for listening to this podcast. I hope you enjoyed the conversation. As I mentioned, we'll upload the talk on InfoQ.com sometime in the future, so keep an eye on that. Thank you again for listening, and thank you again Shreya for joining The InfoQ Podcast.

About the Author

Shreya Rajpal

Show moreShow less

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.