GeekHub Learn
Module

Tokens and embeddings: how text becomes numbers

Neural networks do not understand text. They only do math on numbers. The conversion from text to numbers is where many engineers get confused and where every API price tag lives.

Think of a multilingual phone book. Every word gets a unique phone number. Words with similar meanings get phone numbers nearby. That neighborhood structure is the "embedding".

Tokens: not words. Tokens are sub-word units chosen by the model's tokenizer (often Byte-Pair Encoding). The English word "tokenization" might be 3 tokens. The Hindi word "नमस्ते" might be 5 tokens. Code symbols often pack into 1 token.

Embeddings: each token id is mapped to a fixed-length vector of floats (often 1,024 or 4,096 numbers). This vector is the model's internal representation. Similar meanings produce nearby vectors.

The tokenizer is a deterministic algorithm that maps strings to a vocabulary of token ids (often 50K to 200K). The model has an embedding matrix of shape [vocab_size, hidden_dim]. Looking up a token id retrieves its starting vector. Subsequent layers refine that vector contextually.

The "context window" is measured in tokens, not characters or words. A 128K context model can hold roughly 96,000 English words.

Visualize it

Insert a 3-panel visual: (1) "Hello world" highlighted with token boundaries, (2) each token shown with its id, (3) each id shown landing in a vector neighborhood, with similar words nearby.

Try it now

On the OpenAI tokenizer page, type the same sentence in English, Hindi, and code. Compare token counts. This is exactly how you will lose money if you ignore it.

Hands-on lab

In a notebook (or even pen and paper), estimate the cost of a chat that uses 500 input tokens and 800 output tokens, on a model that charges $1 per million input tokens and $3 per million output tokens. Answer at end of lesson.

Try it now

If a Hindi sentence is 3x more tokens than its English translation, what does that mean for cost and latency?

Common mistakes

  • Estimating context window in "words" instead of tokens
  • Assuming all languages cost the same per character
  • Ignoring that JSON output uses more tokens than plain text

Debugging tip

If responses get cut off, your max_tokens is too low. If they are slow, your input token count may be too high. Both are token-driven.

Challenge

Use the OpenAI tokenizer to find a paragraph that produces wildly different token counts in two languages. Document it as a one-page note.

Where this shows up

  • Token counting is required for billing dashboards.
  • Prompt compression libraries shrink token counts to fit context windows.
  • Internationalization: an Indian-language chatbot can be 3x more expensive without redesign.

From the field

In 2026 production, engineers regularly cut costs 30 to 50% by improving tokenization choices: pre-summarizing inputs, switching providers whose tokenizer handles their target language better, or moving to small models for short tasks.

Recap

Text becomes tokens, tokens become embeddings, embeddings are the language the model thinks in. Cost, context limits, and latency are all token-driven.

Hands-on Lab answer: 500 input tokens at $1/M = $0.0005. 800 output at $3/M = $0.0024. Total per chat = $0.0029.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

What is a token? Why is it not a word?

Quiz time

1 question · tap an answer to check it

  1. 1. The context window is measured in

Finished lesson 2.2?

Mark complete to update your module progress and unlock the streak.

Loading