GeekHub Learn
Module
Lesson 5.88 of 8 in this module2 min read Module 5: Using AI APIs (OpenAI, Gemini, Anthropic)

Cost control patterns

Final lesson of the module. Cost control is where engineers earn or lose their jobs. Master these five patterns and your apps stay cheap.

Five money-saving habits at a restaurant: order what you need, share, take leftovers home, check the bill, set a monthly cap. Same five habits apply to LLM apps.

The five cost control patterns:

  1. Right-size the model: smaller model for easy tasks.
  2. Cache repeated calls: identical input -> stored output.
  3. Compress prompts: drop boilerplate, summarize history.
  4. Use prompt caching: providers reuse prefix tokens at lower cost.
  5. Hard caps: monthly spending limits at the provider dashboard.

A simple memory cache:

import hashlib, json

_cache = {}

def cached_ask(messages, model="gpt-4o-mini"):
    key = hashlib.sha256(json.dumps(messages, sort_keys=True).encode()).hexdigest()
    if key in _cache:
        return _cache[key]
    out = call_llm(messages, model=model)
    _cache[key] = out
    return out

In production, replace the dict with Redis or Cloudflare KV.

Visualize it

A waterfall chart: original cost -> after right-sizing -> after caching -> after compression -> after prompt caching, each step dropping the bar.

Try it now

Take any prompt you wrote this module. Estimate its monthly cost at 1,000 users x 5 calls/day. Apply each pattern. Re-estimate.

Hands-on lab

Add caching to your ask() function. Re-run the same prompt twice. Confirm the second call is free.

Try it now

Why does caching not work for chat with temperature > 0?

Common mistakes

  • Caching responses that include time-sensitive info
  • Forgetting that different users may need different responses
  • Not setting hard spending caps (every horror story starts here)

Debugging tip

If your bill spikes, the fix is usually one of: model too big, no caching, history grew unboundedly, no spending cap. Check in that order.

Challenge

Audit one of your own AI scripts. Apply the five patterns. Report the projected cost drop.

Where this shows up

  • Cost-aware production apps
  • Internal AI tools with strict budgets
  • Free-tier consumer apps

From the field

The engineer who can cut AI costs 80% without quality loss is the most valuable person on any AI team. This skill compounds quarterly as your usage scales.

Recap

Right-size, cache, compress, prompt-cache, cap. Five habits. Apply forever.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Name three cost control patterns.

Quiz time

1 question · tap an answer to check it

  1. 1. The fastest cost saving in most apps is

Finished lesson 5.8?

Mark complete to update your module progress and unlock the streak.

Loading