GeekHub Learn
Module

Tokens, in depth: how to count and why it matters

Tokens are the currency of LLMs. You pay per token, you are limited per token, you are throttled per token. Learn to think in tokens or pay extra forever.

Tokens are like Uber's per-kilometer pricing. Words are the rider's mental model of distance. The driver only cares about meters. You are the driver now.

A token is a small chunk of text the model treats as one unit. English averages about 1.3 tokens per word. Hindi, Arabic, Tamil, and many other non-Latin scripts often run 2 to 4 tokens per word. Code uses far fewer tokens per character because symbols pack tightly.

Input tokens (your prompt) and output tokens (the response) are usually priced differently. Output is more expensive almost everywhere.

The tokenizer is provider-specific. OpenAI's GPT-4 family uses o200k_base (200K vocabulary). Anthropic's Claude uses its own. Google's Gemini uses SentencePiece. Token counts will differ across providers even for the same text.

In Python, count tokens with:

import tiktoken
enc = tiktoken.get_encoding("o200k_base")
text = "Hello, world!"
tokens = enc.encode(text)
print(len(tokens), tokens)

Visualize it

A 3-row table:

  • "ChatGPT is amazing" -> 3 tokens
  • "Pneumonoultramicroscopicsilicovolcanoconiosis" -> ~10 tokens
  • def hello(): print("hi") -> ~6 tokens

Try it now

Tokenize your own name, a famous quote, a JSON snippet, and a paragraph in your native language using https://platform.openai.com/tokenizer. Save the four counts.

Hands-on lab

Install Python and run:

pip install tiktoken

Then run a small script that takes a file and prints the token count. This is the building block of every cost estimator you will build.

Try it now

If output tokens cost 3x input tokens and your typical request is 200 input + 800 output, what fraction of your bill is output?

Common mistakes

  • Assuming every provider gives the same count
  • Forgetting that whitespace, newlines, and emojis count as tokens
  • Counting words instead of tokens when planning a context budget

Debugging tip

When you see "context length exceeded", look at input + system prompt + history. The output reserve also eats into your limit.

Challenge

Build a CLI that takes a file and prints (a) token count, (b) estimated input cost, (c) estimated total cost for a chat with 1.5x output ratio at GPT-4o pricing.

Where this shows up

  • Pre-flight token budget check before calling expensive long-context models
  • Internal cost dashboards
  • Multilingual feature pricing decisions

From the field

The fastest cost savings in 2026 production usually come from tokenizer-aware prompt shaping: dropping unnecessary boilerplate, choosing models with better tokenizers for your target language, and using small models for high-frequency calls.

Recap

Tokens are billing units. Count them, budget them, optimize them. Every senior AI engineer thinks in tokens before they think in words.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Why are token counts provider-specific?

Quiz time

1 question · tap an answer to check it

  1. 1. Which is most likely to use the most tokens?

Finished lesson 3.1?

Mark complete to update your module progress and unlock the streak.

Loading