Cost control patterns
Final lesson of the module. Cost control is where engineers earn or lose their jobs. Master these five patterns and your apps stay cheap.
Five money-saving habits at a restaurant: order what you need, share, take leftovers home, check the bill, set a monthly cap. Same five habits apply to LLM apps.
The five cost control patterns:
- Right-size the model: smaller model for easy tasks.
- Cache repeated calls: identical input -> stored output.
- Compress prompts: drop boilerplate, summarize history.
- Use prompt caching: providers reuse prefix tokens at lower cost.
- Hard caps: monthly spending limits at the provider dashboard.
A simple memory cache:
import hashlib, json
_cache = {}
def cached_ask(messages, model="gpt-4o-mini"):
key = hashlib.sha256(json.dumps(messages, sort_keys=True).encode()).hexdigest()
if key in _cache:
return _cache[key]
out = call_llm(messages, model=model)
_cache[key] = out
return out
In production, replace the dict with Redis or Cloudflare KV.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
Name three cost control patterns.
Quiz time
1 question · tap an answer to check it
1. The fastest cost saving in most apps is
Finished lesson 5.8?
Mark complete to update your module progress and unlock the streak.
Loading