Skip to main content
Original: Simon Willison · 20/02/2026

Summary

At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate de

Key Insights

“Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost.” — Discussing the role of prompt caching in improving Claude Code’s performance.
“A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they’re too low.” — Explaining the operational benefits of maintaining a high prompt cache hit rate for Claude Code.

Topics


Full Article

# Quoting Thariq Shihipar
Author: Simon Willison
Published: 2026-02-20
Source: https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-everything

Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost. […] At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they’re too low.
Thariq Shihipar

Key Takeaways

Notable Quotes

Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost.
Context: Discussing the role of prompt caching in improving Claude Code’s performance.
A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they’re too low.
Context: Explaining the operational benefits of maintaining a high prompt cache hit rate for Claude Code.
  • [[topics/agent-native-architecture]]
  • [[topics/prompt-engineering]]
  • [[topics/claude-code]]

[AINews] Anthropic's Agent Autonomy study

Swyx · explanation · 70% similar

Effective harnesses for long-running agents

Anthropic Engineering · how-to · 69% similar

I dream about AI subagents; they whisper to me while I'm asleep

Geoffrey Huntley · explanation · 68% similar