Original: Geoffrey Huntley · 13/04/2025
Summary
Claude 3.7’s advertised context window is 200k, but I’ve noticed that the quality of output clips at the 147k-152k mark. In a previous post, I shared about “real context window” sizes and “advertised context window sizes”Key Insights
“Claude 3.7’s advertised context window is 200k, but I’ve noticed that the quality of output clips at the 147k-152k mark.” — Discussing the limitations of current AI context windows.
“LLM context windows are like RAM in an IBM 8086 XT and are a precious resource, but engineers and developer tooling companies do not treat them as such.” — Comparing LLM context windows to historical RAM limitations.
“What if an agent could spawn a new agent and clone the context window?” — Speculating on the future possibilities of AI subagents.
Topics
Full Article
Published: 2025-04-13
Source: https://ghuntley.com/subagents/
In a previous post, I shared about “real context window” sizes and “advertised context window sizes”
Claude 3.7’s advertised context window is 200k, but I’ve noticed that the quality of output clips at the 147k-152k mark. Regardless of which agent is used, when clipping occurs, tool call to tool call invocation starts to failThe short version is that we are in another era of “640kb should be enough for anyone,” and folks need to start thinking about how the current generation of context windows is similar to RAM on a computer in the 1980s until such time that
DOS=HIGH,UMB becomes a thing…
LLM context windows are like RAM in an IBM 8086 XT and are a precious resource, but engineers and developer tooling companies do not treat them as such.
The current generation of coding agents work via a tight evaluation loop of tool calls to tool calls that operate within a single context window (ie. RAM). However, the problem with this design is that when an LLM provides a bad outcome, the coding assistants/agents’ death spiral and brute force on the main context window which consumes precious resources as it tries to figure out the next steps.
the current generation of software development agents works like this. it’s not great (tm)
However, I’ve been thinking: What if an agent could spawn a new agent and clone the context window? If such a thing were possible, it would enable an agent to spawn a sub-agent. The main agent would pause, wait for the sub-agent to burn through its own context window (ie. SWAP), and then provide concrete next steps for the primary agent.
i suspect next generation agents will look something like this under the hood
It’s theoretical right now, and I haven’t looked into it. Still, I dream of the possibility that in the future, software development agents will not waste precious context (RAM) and enter a death spiral on the main thread.
p.s. socials
pps. extra reading
Building Multi-Agent Systems Scaling LLM-based agents to handle complex problems reliably. — Shrivu’s Substack
Key Takeaways
Notable Quotes
Claude 3.7’s advertised context window is 200k, but I’ve noticed that the quality of output clips at the 147k-152k mark.Context: Discussing the limitations of current AI context windows.
LLM context windows are like RAM in an IBM 8086 XT and are a precious resource, but engineers and developer tooling companies do not treat them as such.Context: Comparing LLM context windows to historical RAM limitations.
What if an agent could spawn a new agent and clone the context window?Context: Speculating on the future possibilities of AI subagents.
Related Topics
- [[topics/prompt-engineering]]
- [[topics/ai-agents]]
- [[topics/agent-native-architecture]]
Related Articles
if you are redlining the LLM, you aren't headlining
Geoffrey Huntley · explanation · 83% similar
I dream of roombas - thousands of automated AI robots that autonomously maintain codebases
Geoffrey Huntley · explanation · 83% similar
autoregressive queens of failure
Geoffrey Huntley · explanation · 82% similar
Originally published at https://ghuntley.com/subagents/.