Stanford University Open-Sources Controllable Generative Language AI Diffusion-LM

Researchers at Stanford University have open-sourced Diffusion-LM, a non-autoregressive generative language model that allows for fine-grained control of the model's output text. When evaluated on controlled text generation tasks, Diffusion-LM outperforms existing methods.

The model and experiments were described in a paper published on arXiv. Diffusion-LM is a generative language model that uses a plug-and-play control scheme, where the language model is fixed, and its generation is steered by an external classifier that determines how well the generated text matches the desired parameters. Users can specify several features of the desired output, including required parts of speech, syntax tree, or sentence length. During generation, Diffusion-LM iteratively denoises a set of latent vectors, with the external controller providing gradient updates to steer the latent vectors to generate the desired output. When evaluated on a set of control tasks, Diffusion-LM "significantly" outperformed baseline methods. According to the research team,

We find the complex controls enabled by Diffusion-LM to be compelling, and we are excited by how Diffusion-LM is a substantial departure from the current paradigm of discrete autoregressive generation.

Many generative language models (LM), such as GPT-3, are autoregressive; that is, they recursively generate text by predicting the next word in a sequence, then add that word to the existing sequence and use the updated sequence as input for further prediction. These models can generate text that is indistinguishable from that written by humans, and the models can generate text to solve a wide range of problems from question-answering to interactive chat. However, it is difficult to provide any user control over the generated output; for example, a desired sentence length, structure, or sentiment.

One potential solution to this problem is to fine-tune the LM so that it can take an additional control input, but this update can be compute intensive and may not generalize to handle multiple control parameters. Another solution is a plug-and-play technique, which keeps the LM's parameters frozen and steers the generation with an external classifier that evaluates how close the generated output is to the desired parameters. However, attempts to steer autoregressive models have proved challenging.

Instead of trying to steer an autoregressive LM, the Stanford researchers chose to use a new technique for language generation: a diffusion model. These models have shown good results in computer vision and other continuous domains; however, they have not been applied to text generation, which is a discrete domain. According to the team, Diffusion-LM is the first diffusion model for text generation.

To make Diffusion-LM work, the team modified the standard diffusion model in two ways. First, they defined an embedding function that maps words into vectors in the continuous latent space of the diffusion model. Second, they defined a "rounding" method to map these vectors back to discrete words. To generate text, the model begins with a random vector in the latent space; this is treated as a noisy version of the output sentence's embedding. The model then iteratively denoises it; at each step, the embedding is passed to an external classifier, which produces a gradient update of the embedding for the next step of the iteration. When the iterations are done, the rounding method maps the final embedding to a text output.

Diffusion-LM Architecture

Image source: https://arxiv.org/abs/2205.14217

The Stanford team evaluated Diffusion-LM on five classifier-guided text generation control tasks and compared its performance to baseline methods using a GPT-2 autoregressive LM, using both plug-and-play and fine-tuning. On all five tasks, Diffusion-LM outperformed the other plug-and-play methods; it also outperformed fine-tuning on two tasks with "similar" performance on the other three. The team also evaluated Diffusion-LM on an unguided text-infilling task against three different baseline models; it outperformed two of them and achieved "comparable" performance to an autoregressive model specifically trained for infilling.

The team did discover that Diffusion-LM was slower than other models, for both training and runtime decoding. Its output also scored worse on a perplexity. In a Twitter thread about the work, lead author Xiang Lisa Li noted:

Diffusion-LM shows strong performance in controllable generation, but it remains an open question whether it could match autoregressive LMs in [perplexity] and speed.

The Diffusion-LM code is available on GitHub.

About the Author

Anthony Alford

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Anthony Alford

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter