BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Managing the Carbon Emissions Associated with Generative AI

Managing the Carbon Emissions Associated with Generative AI

Key Takeaways

  • There’s an increasing concern around carbon emissions as generative AI becomes more integrated in our everyday lives
  • The comparisons of carbon emissions between generative AI and the commercial aviation industry are misleading
  • Organizations should incorporate best practices to mitigate emissions specific to generative AI. Transparency requirements could be crucial to both training and using AI models
  • Improving energy efficiency in AI models is valuable not only for sustainability but also for improving capabilities and reducing costs
  • Prompt engineering becomes key to reducing computational resources and thus carbon emitted when using gen AI. Commands that generate shorter outputs would use less computation which leads to a new process "green prompt engineering"

Introduction

Recent developments in generative AI are transforming our industry and our broader society. Language models like ChatGPT and CoPilot are drafting letters and writing code, image and video generation models can create compelling content from a simple prompt, while music and voice models allow easy synthesis of speech in anyone’s voice, and the creation of sophisticated music.

Conversations on the power and potential value of this technology are happening around the world. At the same time, people are talking about risks and threats.

From extremist worries about superintelligent AI wiping out humanity, to more grounded concerns about the further automation of discrimination and the amplification of hate and misinformation, people are grappling with how to assess and mitigate the potential negative consequences of this new technology.

People are also increasingly concerned about the energy use and corresponding carbon emissions of these models. Dramatic comparisons have resurfaced in recent months.

One article, for example, equates the carbon emissions of training GPT-3 to driving to the moon and back; another, meanwhile, explains that training an AI model emits massively more carbon than a long-distance flight.

The ultimate impact will depend on how this technology is used and to what degree it is integrated into our lives.

It is difficult to anticipate exactly how it will impact our day to day, but one current example, the search giants integrating generative AI into their products, is fairly clear.

As per a recent Wired article:

Martin Bouchard, cofounder of Canadian data center company QScale, believes that, based on his reading of Microsoft and Google’s plans for search, adding generative AI to the process will require "at least four or five times more computing per search" at a minimum.

It’s clear that generative AI is not to be ignored.

Are carbon emissions of generative AI overhyped?

However, the concerns about the carbon emissions of generative AI may be overhyped. It's important to put things in perspective: the entire global tech sector accounts for 1.8% to 3.9% of global greenhouse-gas emissions but only a fraction of those emissions are caused by AI[1]. Dramatic comparisons between AI and aviation or other sources of carbon are creating confusion from differences in scale: while there are many cars and aircraft traveling millions of kilometers every day, training a modern AI model like the GPT models is something that only happens a relatively small number of times.

Admittedly, it’s unclear exactly how many large AI models have been trained. Ultimately, that depends on how we define "large AI model." However, if we consider models at the scale of GPT-3 or larger, it is clear that there have been fewer than 1,000 such models trained. To do a little math:

 

A recent estimate suggests that training GPT-3 emitted 500 metric tons of CO2. Meta’s LLaMA model was estimated to emit 173 tons. Training 1,000 500-ton models would involve a total emission of about 500,000 metric tons of CO2. Newer models may increase the emissions somewhat, but the 1,000 models is almost certainly an overestimate and so accounts for this. The commercial aviation industry emitted about 920,000,000 metric tons of CO2 in 2019[2], almost 2,000 times as much as LLM training, and keep in mind that this compares one year of aviation to multiple years of LLM training. The training of LLMs is still not negligible, but the dramatic comparisons are misleading. More nuanced thinking is needed.

This, of course, is only considering the training of such models. The serving and use of the models also requires energy and has associated emissions. Based on one analysis, ChatGPT might emit about 15,000 metric tons of CO2 to operate for a year. Another analysis suggests much less at about 1,400 metric tons. Not negligible, but still nothing compared to aviation.

Emissions transparency is needed

But even if the concerns about the emissions of AI are somewhat overhyped, they still merit attention, especially as generative AI becomes integrated into more and more of our modern life. As AI systems continue to be developed and adopted, we need to pay attention to their environmental impact. There are many well-established practices that should be leveraged, and also some ways to mitigate emissions that are specific to generative AI.

Firstly, transparency is crucial. We recommend transparency requirements to allow for monitoring of the carbon emissions related to both training and use of AI models. This will allow those deploying these models and also end users to make informed decisions about their use of AI based on its emissions. And also to incorporate AI-related emissions into their greenhouse gas inventories and net zero targets. This is one component of holistic AI transparency.

As an example of how such requirements might work, France has recently passed a law mandating telecommunications companies to provide transparency reporting around their sustainability efforts. A similar law could require products incorporating AI systems to report carbon emissions to their customers and also for model providers to integrate carbon emissions data into their APIs.

Greater transparency can lead to stronger incentives to build energy-efficient generative AI systems, and there are many ways to increase efficiency. In another recent InfoQ article, Sara Bergman, Senior Software Engineer at Microsoft, encourages people to consider the entire lifecycle of an AI system and provides advice on applying the tools and practices from the Green Software Foundation to making AI systems more energy efficient, including careful selection of server hardware and architecture, as well as time and region shifting to find less carbon-intensive electricity. But generative AI presents some unique opportunities for efficiency improvements.

Efficiency: Energy use and model performance

As explored in Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning the carbon emissions associated with the training or use of a generative AI model depends on many factors, including:

  • Number of model parameters
  • Quantization (numeric precision)
  • Model architecture
  • Efficiency of GPUs or other hardware used
  • Carbon-intensity of electricity used

The latter factors are relevant for any software and well explored by others, such as the InfoQ article that we mentioned. Thus, we will focus on the first three factors here, all of which involve some tradeoff between energy use and model performance.

It’s worth noting that efficiency is valuable not only for sustainability concerns. More efficient models can improve capabilities in situations where less data is available, decrease costs, and unlock the possibility of running on edge devices.

Number of model parameters

As shown in this figure from OpenAI’s paper, "Language Models are Few-Shot Learners", larger models tend to perform better.

This is also a point made in Emergent Abilities of Large Language Models:

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models.

We see that not only do larger models do better at a given task, but there are actually entirely new capabilities that emerge only as models get large.  Examples of such emergent capabilities include adding and subtracting large numbers, toxicity classification, and chain of thought techniques for math word problems.

But training and using larger models requires more computation and thus more energy.  Thus, we see a tradeoff between the capabilities and performance of a model and its computational, and thus carbon, intensivity.

Quantization

There has been significant research into the quantization of models. This is where lower-precision numbers are used in model computations, thus reducing computational intensivity, albeit at the expense of some accuracy. It has typically been applied to allow models to run on more modest hardware, for example, enabling LLMs to run on a consumer-grade laptop. The tradeoff between decreased computation and decreased accuracy is often very favorable, making quantized models extremely energy-efficient for a given level of capability. There are related techniques, such as "distillation", that use a larger model to train a small model that can perform extremely well for a given task.

Distillation technically requires training two models, so it could well increase the carbon emissions related to model training; however it should compensate for this by decreasing the model’s in-use emissions. Distillation of an existing already-trained model can also be a good solution. It’s even possible to leverage both distillation and quantization together to create a more efficient model for a given task.

Model Architecture

Model architecture can have an enormous impact on computational intensivity, so choosing a simpler model can be the most effective way to decrease carbon emission from an AI system. While GPT-style transformers are very powerful, simpler architectures can be effective for many applications. Models like ChatGPT are considered "general-purpose" meaning that these models can be used for many different applications. However, when a fixed application is required, using a complex model may be unnecessary. A custom model for the task may be able to achieve adequate performance with a much simpler and smaller architecture, decreasing carbon emissions.  Another useful approach is fine-tuning -- the paper Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning discusses how fine-tuning "offers better accuracy as well as dramatically lower computational costs".

Putting carbon and accuracy metrics on the same level

The term "accuracy" easily feeds into a "more is better" mentality. To address this, it is critical to understand the requirements for the given application – "enough is enough". In some cases, the latest and greatest model may be needed, but for other applications, older, smaller, possibly quantized models might be perfectly adequate. In some cases, correct behavior may be required for all possible inputs, while other applications may be more fault tolerant. Once the application and level of service required is properly understood, an appropriate model can be selected by comparing performance and carbon metrics across the options. There may also be cases in which a suite of models can be leveraged. Requests can, by default, be passed to simpler, smaller models, but in cases in which the task can’t be handled by the simple model, it can be passed off to a more sophisticated model.

Here, integrating carbon metrics into DevOps (or MLOps) processes is important. Tools like codecarbon make it easy to track and account for the carbon emissions associated with training and serving a model. Integrating this or a similar tool into continuous integration test suites allows carbon, accuracy, and other metrics to be analyzed in concert. For example, while experimenting with model architecture, tests can immediately report both accuracy and carbon, making it easier to find the right architecture and choose the right hyperparameters to meet accuracy requirements while minimizing carbon emissions.

It’s also important to remember that experimentation itself will result in carbon emissions. In the experimentation phase of the MLOps cycle, experiments are performed with different model families and architectures to determine the best option, which can be considered in terms of accuracy, carbon and, potentially, other metrics. This can save carbon in the long run as the model continues to be trained with real-time data and/or is put into production, but excessive experimentation can waste time and energy. The appropriate balance will vary depending on many factors, but this can be easily analyzed when carbon metrics are available for running experiments as well as production training and serving of the model.

Green prompt engineering

When it comes to carbon emissions associated with the serving and use of a generative model, prompt engineering becomes very important as well. For most generative AI models -- like  GPT -- the computational resources used, and thus carbon emitted, depend on the number of tokens passed to and generated by the model.

While the exact details depend on the implementation, prompts are generally passed "all at once" into transformer models. This might make it seem like the amount of computation doesn’t depend on the length of a prompt. However, due to the quadratic nature of the self-attention mechanism, it’s reasonable to expect that optimizations would suppress this function for unused portions of the input, meaning that shorter prompts save computation and thus energy.
For the output, it is clear that the computational cost is proportional to the number of tokens produced, as the model needs to be "run again" for each token generated.

This is reflected in the pricing structure for OpenAI’s API access to GPT4. At the time of writing, the costs for the base GPT4 model are $0.03/1k prompt tokens and $0.06/1k sampled tokens. The prompt length and length of the output in tokens are both incorporated into the price, reflecting the fact that both influence the amount of computation that is required.

So, shorter prompts and prompts that will generate shorter outputs will use less computation. This suggests a new process of "green prompt engineering". With proper support for experimentation in an MLOps platform, it becomes relatively easy to experiment with shortening prompts while continuously evaluating the impact of both carbon and system performance.

As well as considering only single prompts, there are interesting approaches being developed to improve efficiency for more complex use of LLMs, as in this paper.

Conclusion

Although possibly overhyped, the carbon emissions of AI are still of concern and should be managed with appropriate best practices. Transparency is needed to support effective decision-making and consumer awareness. Also, integrating carbon metrics into MLOps workflows can support smart choices about model architecture, size, quantization, as well as effective green prompt engineering. The content in this article is an overview only and just scratches the surface. For those that truly want to do green generative AI, I encourage you to follow the latest research.

Footnotes

 

About the Author

Rate this Article

Adoption
Style

BT