Original: Simon Willison · 20/02/2026
Summary
At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate deKey Insights
“Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost.” — Discussing the role of prompt caching in improving Claude Code’s performance.
“A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they’re too low.” — Explaining the operational benefits of maintaining a high prompt cache hit rate for Claude Code.
Topics
Full Article
Published: 2026-02-20
Source: https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-everything
Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost. […] At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they’re too low.— Thariq Shihipar
Key Takeaways
Notable Quotes
Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost.Context: Discussing the role of prompt caching in improving Claude Code’s performance.
A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they’re too low.Context: Explaining the operational benefits of maintaining a high prompt cache hit rate for Claude Code.
Related Topics
- [[topics/agent-native-architecture]]
- [[topics/prompt-engineering]]
- [[topics/claude-code]]
Related Articles
[AINews] Anthropic's Agent Autonomy study
Swyx · explanation · 70% similar
Effective harnesses for long-running agents
Anthropic Engineering · how-to · 69% similar
I dream about AI subagents; they whisper to me while I'm asleep
Geoffrey Huntley · explanation · 68% similar
Originally published at https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-everything.