The Weird Economics of AI Inference: Are You Really Getting $8,000 of Value for $200?

AI Economics Diagram

There are countless posts claiming that your monthly AI usage plan is insanely subsidized. For instance:

"We are literally burning through VC money like crazy with our coding subscriptions. I read the $200 Anthropic sub gets you $8000 worth of API calls." — What happens when they stop subsidizing LLM subscriptions? (r/LocalLLaMA)

The logic behind this often comes from studies highlighting potential API costs, such as:

"A $200 ChatGPT Pro 20x subscription could cost as much as $14,000 in API pricing if fully utilized. Anthropic's Claude Max 20x plan... has a comparable ceiling, with potential usage totaling roughly $8,000 in token costs." — TechSpot

If you compare token costs with subscription costs, you might be able to consume enough tokens to hit that $14,000 figure if you paid per token. However, claims that AI providers are losing massive amounts of money per user are largely based on a misunderstanding of how AI inference economics work.

Let's try to explain the economics of AI inference, and how these plans actually function.

Le's start with some basic terminology

If you're not familiar with LLMs, here are a few concepts you should know:

AI inference: Running an already trained model to process data. (Here's a good explainer from Google).
Token: The smallest piece of information the model can process. Data gets broken down into individual tokens before processing. (Here's a primer from Nvidia).
Input vs Output token: Input tokens refer to the information you send to the model. Output tokens refer to the text generated by the model.

Providers typically offer a price for both. For instance, GPT-5.5 costs $5 per 1 million input tokens and $30 per 1 million output tokens.

The Simple Reality

According to OpenAI's documents submitted to the SEC, they are actually turning a gross profit on AI inference. In 2025, their revenue was $13.07 billion, while their cost of revenue was $7.5 billion (via Ars Technica). Similarly, according to the Wall Street Journal, Anthropic turned an operating profit last quarter.

While I don't have their exact internal financial breakdown, it's safe to say AI Providers have positive gross margins on AI inference services. But how can "getting $14,000 worth of tokens for $200" be true, while AI inference as a whole remains profitable?

The Gym Analogy: Why Inference is Profitable

The best analogy for AI inference is a gym:

It costs a massive amount of money upfront to build the gym and purchase machines.
The machines depreciate over time.
It costs money to upkeep and operate the machines.
You pay a fixed monthly price to access the gym.
When the gym gets busy, you might have to wait or get kicked out for hogging equipment.

The only difference? AI companies offer "per token" billing, whereas gyms don't offer "per rep" billing.

Imagine if a gym charged 2 cents every time you lifted a weight. A hardcore bodybuilder might say, "I only paid $100/month, but I did $4,000 worth of reps! The gym is losing $3,900 on me!"

AI has a massive upfront cost (servers, GPUs, data center space) and fixed operating costs (staff, rent, internet). The incremental cost is mostly electricity, water and data center maintenance.

If we look at an Nvidia H200 GPU running full tilt (700W or 0.7 kWh):

Assuming electricity costs 10 cents per kWh.
Running 1 H200 for an hour costs about 7 cents.
If a model needs 4 H200s, running it for an hour costs 28 cents.

Top models generate about 50-70 output tokens per second. Assuming 60 per second, that's 216,000 tokens per hour per concurrent instance. Depending on the GPU requirements, generating these 216,000 tokens costs between 7 and 28 cents in electricity. At an API price of $30 per million output tokens, that's over $6 in tokens generated for a few cents of electricity.

AI companies use their GPUs to convert pennies worth of electricity into dollars worth of tokens. The cost for each incremental token assuming sufficient hardware capacity is trivial. Token pricing is set to maximize profit, not because the marginal cost of computing a token is high.

Subscriptions and Rate Limits

When you use an AI service heavily during peak times, you get rate-limited. Why? Because they don't have enough capacity,just like waiting for a bench press at 6 PM on a Monday. But during off-peak hours, generating tokens costs the provider very little, so you can run wild.

The studies claiming you get "$8,000 worth of tokens" usually rely on scripts that ping the service 24/7. Most of those tokens are generated during off-peak hours when the provider has excess capacity.

At the end of the day, AI inference is like an all-you-can-eat buffet. A few power users might eat into the margins, but the vast majority of users aren't running automated scripts to maximize token consumption. In fact, heavy users today mostly complain about throttling and usage limits precisely because providers protect their peak-time capacity.

If Inference is Profitable, Why Do AI Labs Lose Money?

There are many businesses where serving an incremental customer is nearly free, yet they still lose money (gyms, theaters, SaaS). They have high fixed and upfront costs.

AI labs lose money because of Research & Development. People pay OpenAI because they want the best models. To stay ahead, labs spend staggering amounts on R&D. OpenAI spent $19.18 billion on R&D last year, completely eclipsing their $7.5 billion cost of revenue.

Furthermore, we don't know the true depreciation curve of these GPUs. Their value is tied to being cutting-edge. When a better GPU releases, the market value of older models plummets, even though they still have utility.

So yes, your $200 subscription might involve massive capital expenditure behind the scenes, and margins aren't currently high enough to recoup the gargantuan upfront R&D costs. But the fundamental mechanism of AI inference selling tokens is not the giant money burner that social media posts make it out to be.