Tokens and embeddings: how text becomes numbers
Neural networks do not understand text. They only do math on numbers. The conversion from text to numbers is where many engineers get confused and where every API price tag lives.
Think of a multilingual phone book. Every word gets a unique phone number. Words with similar meanings get phone numbers nearby. That neighborhood structure is the "embedding".
Tokens: not words. Tokens are sub-word units chosen by the model's tokenizer (often Byte-Pair Encoding). The English word "tokenization" might be 3 tokens. The Hindi word "नमस्ते" might be 5 tokens. Code symbols often pack into 1 token.
Embeddings: each token id is mapped to a fixed-length vector of floats (often 1,024 or 4,096 numbers). This vector is the model's internal representation. Similar meanings produce nearby vectors.
The tokenizer is a deterministic algorithm that maps strings to a vocabulary of token ids (often 50K to 200K). The model has an embedding matrix of shape [vocab_size, hidden_dim]. Looking up a token id retrieves its starting vector. Subsequent layers refine that vector contextually.
The "context window" is measured in tokens, not characters or words. A 128K context model can hold roughly 96,000 English words.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
What is a token? Why is it not a word?
Quiz time
1 question · tap an answer to check it
1. The context window is measured in
Finished lesson 2.2?
Mark complete to update your module progress and unlock the streak.