Why Certain AI Models Emit 50 Times More Greenhouse Gases to Answer Identical Questions

Large language models (LLMs) have swiftly become an integral part of our daily lives. However, their significant energy and resource demands may be accelerating our march toward climate disruption. A recent study reveals that some AI models emit substantially more greenhouse gases than others when processing identical queries.

Disparities in Carbon Emissions Among AI Models

According to research published in Frontiers in Communication, certain LLMs generate up to 50 times more carbon emissions than their counterparts for the same question. Intriguingly, models with higher accuracy generally come with the steepest environmental costs.

While the full environmental impact of LLMs remains challenging to quantify, prior studies have estimated that training ChatGPT consumed roughly 30 times more energy than an average American’s annual energy use. Yet, it has been unclear whether the energy consumption varies significantly between models during routine question answering.

The Study: Comparing 14 Large Language Models

Researchers from Hochschule München University of Applied Sciences in Germany analyzed 14 LLMs ranging from 7 to 72 billion parameters—these parameters act as the model’s internal “knobs” fine-tuning its language understanding and generation. The team tested the models against 1,000 benchmark questions covering diverse topics.

Tokens, Reasoning, and Energy Use

LLMs convert words and phrases in a prompt into numerical representations called tokens. Some LLMs, particularly those designed for reasoning, insert additional “thinking tokens” during processing to enable deeper analysis before responding. This internal computation requires more energy, which results in increased CO2 emissions.

The study found that reasoning models generate an average of 543.5 thinking tokens per question, compared to just 37.7 tokens for more concise models. For context, GPT-3.5 is a concise model, while GPT-4o exemplifies a reasoning model.

Reasoning Models Drive Up Carbon Footprints

The extra computational steps in reasoning models lead to significantly higher energy demands. Maximilian Dauner, a lead researcher on the project, explained:

“The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach. We found that reasoning-enabled models produced up to 50 times more CO2 emissions than concise response models.”

Accuracy vs. Sustainability: A Trade-Off

The study highlights a clear trade-off: the more accurate a model, the greater its carbon emissions. For example, Cogito—a reasoning model with 70 billion parameters—achieved nearly 85% accuracy but emitted three times more CO2 than similarly sized concise models.

Dauner commented,

“Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies. None of the models that kept emissions below 500 grams of CO2 equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly.”
(CO2 equivalent is a standard metric to quantify the climate impact of various greenhouse gases.)

Subject Matter Also Influences Emissions

The nature of the questions affects emissions too. Complex subjects like abstract algebra or philosophy demanded up to six times more emissions than straightforward topics, underscoring that both model type and query complexity shape environmental costs.

Limitations and Responsible Use

It’s important to note that these findings depend heavily on factors such as local energy grids and the specific models examined, so generalization should be cautious. Nonetheless, the authors urge users to be “selective and thoughtful” in deploying LLMs.

Dauner advises,

“Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power.”

Frequently Asked Questions

Why do some AI models emit more greenhouse gases than others?

AI models differ in how they process and answer questions. “Reasoning models” often perform more complex internal computations, generating more “thinking tokens” before producing a response. This additional processing consumes more energy, resulting in higher carbon emissions.

What are “tokens” and “thinking tokens”?

Tokens are the basic units of text that AI models use to understand and generate language. “Thinking tokens” are special internal steps used by some models to perform deeper reasoning before answering, which increases energy use.

Are more accurate AI models always worse for the environment?

Not always, but there is a trend. The study found that models with higher accuracy—especially reasoning models—typically consume more energy and emit more CO2. This presents a trade-off between performance and sustainability.

How much more CO2 do high-emission models produce?

The study showed that some models can emit up to 50 times more CO2 than others when answering the same question, depending on their reasoning complexity and token usage.

Does the type of question asked affect emissions?

Yes. Questions involving complex reasoning—such as in mathematics, philosophy, or abstract logic—require more processing, which can multiply emissions by up to 6 times compared to simpler topics.

Are these findings universal across all AI models?

Not entirely. The results depend on the models tested, the computational environment, and the structure of the local energy grid. However, the study provides strong evidence of patterns that are likely to apply broadly.

What is “CO2 equivalent” and why is it used?

CO2 equivalent (CO2e) is a standard unit that expresses the climate impact of all greenhouse gases in terms of the amount of CO2 that would have the same warming effect. It allows consistent comparison across different emission types.

Conclusion:

As large language models become more embedded in everyday tools—from search engines to customer service—they also bring with them a hidden environmental cost. The new research highlights a pressing reality: not all AI models are created equal when it comes to carbon emissions.

While high-performing reasoning models offer impressive accuracy, they often do so at a significant climate cost—emitting up to 50 times more CO₂ than more concise alternatives. This presents a critical trade-off between intelligence and sustainability.

Why Certain AI Models Emit 50 Times More Greenhouse Gases to Answer Identical Questions

Disparities in Carbon Emissions Among AI Models