AI Image Generators: Energy Consumption & Costs

by drbyos

An international team of researchers from Stanford University and the insurance company Axa has investigated how the energy consumption of diffusion models can be systematically predicted, i.e. the architecture on which image-generating AI systems are built. Popular examples are DALL-E, Midjourney or Google’s Nano Banana. While the high energy consumption of language models such as ChatGPT and other transformer architectures is already widely known and has been scientifically investigated, the equally computationally intensive diffusion models are now moving into the focus of sustainability research.

Read more after the ad

Im Interview: Boris Ruf

Boris Ruf is a data scientist at AXA and an expert in sustainable AI.

In their research paper “Energy Scaling Laws for Diffusion Models,” which the scientists presented at a workshop at the EurIPS conference in early December, they show how the complexity of these algorithms can be modeled theoretically. The power consumption can then be derived based on the arithmetic operations (FLOPs) required to generate an image.

For the prediction, the researchers adapted OpenAI’s Kaplan scaling laws, which were originally developed to predict the performance of language models depending on model size, amount of data and computational effort. In the new variant, they make it possible to estimate the energy consumption of diffusion models based on the required FLOPs. Open source image generators such as Stable Diffusion, Flux and Qwen were used for the experiments. The study considers various combinations of hardware, the number of steps in the generation process, the image resolution and the calculation precision. Nvidia GPUs from the A100, RTX A4000 and RTX A6000 ADA series were used.

The result: Depending on the configuration, a single image can use up to ten times more energy than an average ChatGPT request, which, according to OpenAI CEO Sam Altman, requires about 0.34 watt-hours. The energy requirement varies considerably, particularly depending on the resolution – from 0.051 watt hours at 512 × 512 pixels to 3.58 watt hours at 1024 × 1024 pixels per image.

The researchers’ method should work across models. Trained on one model, it can predict the energy consumption of other architectures – even with different hardware. This enables estimates for proprietary, closed systems such as DALL-E or Midjourney, for which the operators do not yet publish consumption data.

Read more after the ad

The study provides a comprehensive, science-based approach to energy planning for AI image generators. Developers can use this to compare different diffusion models with regard to their energy consumption, and providers are able to estimate the expected energy production before commissioning. The researchers hope to use these findings to promote the efficient development and implementation of AI-powered image and video generators.

The preprint of the study can be found on arXiv.

Transparency note: Boris Ruf is co-author of the study presented.


(pst)

Related Posts

Leave a Comment