Our system uses sub-agents as a core part of its architecture.
That terminology can be confusing, because in other cases (and sometimes in our own architecture, like when executing thousands of operations via MAP) a sub-agent may be a smaller model given less complex individual tasks.
But the core mechanism we use for simulating unlimited context is to allow the main model to spin up instances of itself (sub-agents) with the previously summarized portion of the context expanded into its full, uncompressed state.
Expanding summaries into full text in sub-agents rather than the main thread is a critical part of our architecture, because it prevents the main context window from filling up.
That terminology can be confusing, because in other cases (and sometimes in our own architecture, like when executing thousands of operations via MAP) a sub-agent may be a smaller model given less complex individual tasks.
But the core mechanism we use for simulating unlimited context is to allow the main model to spin up instances of itself (sub-agents) with the previously summarized portion of the context expanded into its full, uncompressed state.
Expanding summaries into full text in sub-agents rather than the main thread is a critical part of our architecture, because it prevents the main context window from filling up.