Tokens, in depth: how to count and why it matters
Tokens are the currency of LLMs. You pay per token, you are limited per token, you are throttled per token. Learn to think in tokens or pay extra forever.
Tokens are like Uber's per-kilometer pricing. Words are the rider's mental model of distance. The driver only cares about meters. You are the driver now.
A token is a small chunk of text the model treats as one unit. English averages about 1.3 tokens per word. Hindi, Arabic, Tamil, and many other non-Latin scripts often run 2 to 4 tokens per word. Code uses far fewer tokens per character because symbols pack tightly.
Input tokens (your prompt) and output tokens (the response) are usually priced differently. Output is more expensive almost everywhere.
The tokenizer is provider-specific. OpenAI's GPT-4 family uses o200k_base (200K vocabulary). Anthropic's Claude uses its own. Google's Gemini uses SentencePiece. Token counts will differ across providers even for the same text.
In Python, count tokens with:
import tiktoken
enc = tiktoken.get_encoding("o200k_base")
text = "Hello, world!"
tokens = enc.encode(text)
print(len(tokens), tokens)
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
Why are token counts provider-specific?
Quiz time
1 question · tap an answer to check it
1. Which is most likely to use the most tokens?
Finished lesson 3.1?
Mark complete to update your module progress and unlock the streak.