stories from the edge of modern engineering

Ever looked at your Claude Sonnet 4.5 cost chart and wondered what those coloured bars actually mean?

Today I was exploring AI capabilities to migrate a schema from DB2 to a modern cloud-native database engine. As an initial step, I needed to spin up a DB2 instance locally inside a Docker container. Once that was running, I decided to check my Anthropic console to see how much this was costing me.

After some research into what each category means, here’s how I interpret them — using a familiar engineering analogy: DB2 running in Docker on a local machine.


Prompt Caching Write ($0.47)

This is the one-time cost of uploading and caching your first large prompt.

Think of it like starting a fresh DB2 container — loading binaries, allocating memory, building buffer pools.

It’s setup-heavy but only happens once if you keep the session alive.


Prompt Caching Read ($0.04)

This is the read-back from the cache — fast and cheap.

Just like reusing a warm DB2 container where data pages are already in memory.

The more you reuse, the more efficient (and cheaper) it becomes.


Output ($0.12)

This is the cost of the model generating responses.

Comparable to DB2 executing SQL and returning result sets — cost grows with complexity and output size.


Input ($0.00)

Your input prompt size was small enough to be negligible.

Like sending a lightweight SQL query or config command.


Engineering Takeaway

The real cost isn’t model computation — it’s context management.

We can reuse cached prompts within 5 minutes, trim unnecessary context, and keep sessions “warm.”

Exactly the same mindset you’d apply to optimising DB2 containers: avoid cold starts, reuse buffers, and keep resources hot.

Leave a Reply

Discover more from Beyond the Stack

Subscribe now to keep reading and get access to the full archive.

Continue reading