Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News AI Lab Extension Allows Podman Desktop Users to Experiment with LLMs Locally

AI Lab Extension Allows Podman Desktop Users to Experiment with LLMs Locally

One year after its 1.0 release, Podman Desktop announced the Podman AI Lab plugin, promising to help developers start working with Large Language Models (LLM) on their machines. Podman AI Lab streamlines LLM workflows featuring generative AI exploration, built-in recipe catalogue, curated models, local model serving, OpenAI-compatible API, code snippets, and playground environments.

The plugin intends "to democratize" gen AI for application developers and to close the gap between "it works on my machine" and that it runs in production on hybrid clouds. Currently, the supported ecosystems are Kubernetes and Red Hat OpenShift.

Developers can install it from the extensions catalogue. It is available for Podman Desktop 1.10 or later.

To allow the developers to build, test and run Gen AI-powered applications, the plugin promises to offer the "ingredients" to get you to the first Gen AI, "Hello World":

Educational applications (Recipes Catalogue): Under the name of "recipe" the application makes available sample applications that allow developers to discover and learn best practices of using gen AI in their application. The recipes catalogue is a public repository to which you can contribute by submitting PRs.

Catalogue of curated models: an out-of-the-box list of ready-to-use open-source models. The plugin promises that the available models have been checked to ensure adherence to legal requirements (usually Apache 2.0 or MIT open-source license). You can also import your model files in GGUF (GPT-Generated Unified Format) format.

Local model serving: the application can generate code snippets for instant integration in developers' applications. To make the transition between "online" and "local" models easier, the application provides an "Open-AI compatible" API. The plugin creates the inference server needed to interact with the model (based on llama.cpp) and hints to the user when there are not enough resources for running the model locally. To check how the application works, you can check it like any pod in pod view: you can see the details and terminal outputs of each container. If needed, you can SSH directly into those.

Playground environments: for testing or fine-tuning models. Whenever running an application locally, Podman runs an inference server for the model used in a container. Also, it displays all the already running applications (recipes). When starting a new playground environment, a prompt assists in finding the best model and settings. The playground allows you to select the temperature (controls the randomness of the model’s output affecting the creativity and predictability), max tokens (sets the maximum length of the model’s response as tokens - words) and top-p. The plugin can define the general context (instructions and guidelines) of each query.

To further understand the project’s mission and direction, InfoQ sat with Stevan Le Meur, PM of the project. Le Meur described these:

Le Meur: With the rapid growth of Gen AI, AI-infused applications are now becoming the norm. Our mission is to provide application developers with the needed [local] tools to easily and cheaply develop and debug this new breed of application while keeping their data safe.

In our vision, a normal lifecycle of an application should be: start from an existing recipe, and try out different models until you find the suitable one for your use case. Tweak it by setting the needed parameters or fine-tune it with InstructLab, even without the ML experience. When you are happy with your outcome, you can make it "deployment-ready" transitioning from local to production with minimal differences between the two environments.

The new plugin for Podman Desktop helps with local experimentation and migrating LLMs to production smoothly. Further, their roadmap hints that they will explore areas like GPU acceleration, function calls enablement or local Retrieval Augmented Generation (RAG). Given the rapidly changing ecosystem, they encourage you to provide feedback or to contribute to the open-source project

About the Author

Rate this Article