How Tokens Work in LLM: A Beginner's Guide to LLM Usage
If you're just getting started with AI, you've probably already run into the word "tokens." They come up every time you send a prompt, check usage stats, or hit a limit. But what actually are they?
Here's the practical breakdown.
So what is a token?
Tokens work the same way with any Western language model. Roughly speaking:
4 characters = 1 token
That's the rule of thumb. A sentence of 100 words is somewhere around 130-150 tokens depending on the words themselves. It's not exact, but it gives you a feel for the scale.
Everything you send in and everything the model sends back is measured in tokens. That includes the thinking process you see in reasoning models, any files you attach, and the final response. It all counts.
How to watch for token usage?
Because every model has limits and they're not all the same. If you're just learning AI and don't have a preference yet, Qwen by Alibaba has very generous context limits that you likely won't hit as an average user.
Beyond that, the differences become pretty clear once you start using these tools regularly.
How different models handle tokens?
I've been using several models day to day and the token experience varies a lot:
ChatGPT — I pay $20.00 a month. On the paid tier I very rarely hit limits, but on the free tier I was hitting them constantly. The difference is noticeable.
Gemini — I pay the same as ChatGPT but sometimes hit the limit on Thinking or Pro. The upside is you can switch between models, and each one has separate limits. That flexibility matters.
Claude — I pay much more for this one but it's my favorite model. Yes, the limits are rough, but that's simply because Claude came out to be most sought after coding platform, the Cloude Code.
Haiku — The most token efficient of the bunch, but with less powerful reasoning as a trade-off. If your task is straightforward and you want to conserve tokens, this is a solid choice.
Sonnet — Actually very good at large document tasks that don't require deep reasoning. It can even estimate token usage (not track accurately, which is worth noting). Handy when you're working with long texts and need a rough sense of where you stand.
Opus — The one that will burn through tokens the fastest. It's the most capable model, but that capability comes at a cost. You'll notice it on long conversations.
What actually costs tokens?
Long prompts cost tokens at the rate described earlier, 4 characters per token, roughly. All the thinking you see in reasoning models? That costs just as many tokens as the response would if they were the same length.
Any files you attach or reference get factored in too. And of course the response itself.
So when you're on a tight limit, it's worth being selective: shorter prompts, fewer attached files, and picking the right model for the job.
Final thought
Tokens are the currency of LLM usage. Once you understand how they work and how different models handle them, you can make much smarter decisions about which model to use for what task and how to stay within your limits without constantly running into walls.