Another question is, why would earlier conversations need to be stored and recalled? They're irrelevant. Only records of the initial requirements and the work done, or work in progress, needs to be stored.
You could definitely build a coding agent that way, and it sounds like you've done it. We store the conversation history because:
1. In our use of coding agents, we find that there are often things referenced earlier in the conversation (API keys, endpoint addresses, feedback to the agent, etc.) that it's useful to have persist.
2. This is a general-purpose LLM memory system, which we've just used here to build a coding agent. But it is also designed for personal assistants, legal LLMs, etc.
1. It does not store chat history, reasoning traces etc, only workflow artifacts (requirements, codebase analysis, implementation plan, etc). I frankly do not believe those things are relevant.
2. It is significantly simpler and more lightweight, using only markdown files
'The Qwen 3.5 series 397B-A17B is a native vision-language model based on a hybrid architecture design. By integrating linear attention mechanisms with sparse Mixture-of-Experts (MoE), it achieves significantly higher inference efficiency. It demonstrates exceptional performance—comparable to current state-of-the-art frontier models—across a wide range of tasks, including language understanding, logical reasoning, code generation, agentic tasks, image and video understanding, and Graphical User Interfaces (GUI). Furthermore, it possesses robust code generation and agent capabilities, showing excellent generalization across various agent-based scenarios
"The Qwen3.5 Native Vision-Language Series Plus model is built on a hybrid architecture that integrates linear attention mechanisms with sparse Mixture-of-Experts (MoE), achieving significantly higher inference efficiency. Across various task evaluations, the 3.5 series demonstrates exceptional performance comparable to current state-of-the-art frontier models. Compared to the Qwen 3 series, this model represents a massive leap forward in both text-only and multimodal capabilities."'
Somewhat underwhelmed. I consider agents to be a sidetrack. The key insight from the Recursive Language Models paper is that requirements, implementation plans, and other types of core information should not be part of context but exist as immutable objects that can be referenced as a source of truth. In practice this just means creating an .md file per stage (spec, analysis, implementation plan, implementation summary, verification and test plan, manual qa plan, global state reference doc).
I created this using PLANS.md and it basically replicates a kanban/scrum process with gated approvals per stage, locked artifacts when it moves to next stage, etc. It works very well and it doesnt need a UI. Sure, I could have several agents running at the same time, but I believe manual QA is key to keeping the codebase clean, so time spent on this today means that future requirements can be implemented 10x faster than with a messy codebase.
This is what I've been doing. Iterating on specs is better than iterating on code. More token efficient and easier to review. Good code effortlessly follows from good specs. It's also a good way to stop the code turning into quicksand (aside from constraining the code with e2e tests, CLI shape, etc).
But what is your concept of "stages"? For me, the spec files are a MECE decomposition, each file is responsible for its unique silo (one file owns repo layout, etc), with cross references between them if needed to eliminate redundancy. There's no hierarchy between them. But I'm open to new approaches.
The stages are modelled after a kanban board. So you can have whichever stages you think are important for your LLM development workflow. These are mine:
00: Iterate on requirements with ChatGPT outside of the IDE. Save as a markdown requirements doc in the repo
01: Inside the IDE; Analysis of current codebase based on the scope of the requirements
02: Based on 00 and 01, write the implementation plan. Implement the plan
03: Verification of implementation coverage and testing
04: Implementation summary
05: Manual QA based on generated doc
06: Update global STATE.md and DECISIONS.md that documents the app, and the what and why of every requirement
Every stage has a single .md as output and after the stage is finished the doc is locked. Every stage takes the previous stages' docs as input.
I have a half-finished draft with more details and a benchmark (need to re-run it since a missing dependency interrupted the runs)
An idea just came into my mind. What if an agent could spawn other agents and provide them with immutable resource files and a 'chrooted' mutable directory those spawned agents could use recursively to prepare immutable resources for other recursively called sub-agents. The immutability and chrooting could be enforced by the harness.
If you add a dialectic between Opus 4.5 and GPT 5.2 (not the Codex variant), your workflow - which I use as well, albeit slightly differently [1] - may work even better.
This dialectic also has the happy side-effect of being fairly token efficient.
IME, Claude Code employs much better CLI tooling+sandboxing when implementing while GPT 5.2 does excellent multifaceted critique even in complex situations.
[1]
- spec requirement / iterate spec until dialectic is exhausted, then markdown
- plan / iterate plan until dialectic is exhausted, then markdown
- implement / curl-test + manual test / code review until dialectic is exhausted
- update previous repo context checkpoint (plus README.md and AGENTS.md) in markdown
adding another external model/agent is exactly what I have been planning as the next step. in fact i already paste the implementation and test summaries into chatgpt, and it is extremely helpful in hardening requirements, making them more extensible, or picking up gaps between the implementations and the initial specs. it would be very useful to have this in the workflow itself, rather than the coding agent reviewing its own work - there is a sense that it is getting tunnel visioned.
i agree that CC seems like a better harness, but I think GPT is a better model. So I will keep it all inside the Codex VSCode plugin workflow.
A few times in my life I have found job opportunities that would have been my dreamjob and I was uniquely qualified due to a cross-disciplinary background, previous experience and education, language skills and such. I was an SME with technical skills and I had so much knowledge of the company's products, industry and competitors that I could have done their marketing strategy and product strategy in a couple of weeks. Maybe it wouldn't all have been correct from the start, but I had so much knowledge that I could have done this by heart.
I spent a lot of time on targeted applications for these places, re-doing my CV and spending weeks iterating on my cover letter. I never heard back from any of those places.
Instead I've been hired into industries I knew nothing about. Sure, I was a decent candidate, but I was just another candidate. This has worked out fine.
Why did these places hire me and not the others? Because they were growing so they had a need to hire. The former places did not.
So for me the only real advice is to apply to places that are growing. When places are growing and really need to hire to expand, all the bullshit in the process is eliminated. Decisions are made fast. It's easier and more pleasant.
> So for me the only real advice is to apply to places that are growing.
Or sometimes when people are leaving and they need a replacement ASAP. That's how I was hired, but it was also quite lucky that there were not many applicants.
1)I need things, unless you want a naked homeless woman without a laptop to apply to your startup
2)Great, still have no job experience or degree
3)I sell everything and show up to the valley naked. I'm suddenly homeless looking for a job in a place where I have no connections, no degree, and no laptop to actually apply for a job or do remote gigs
4)AI startups number 1-4 are doomed to fail, 5 and 6 expect me to work 12 hour days with little pay and no benefits, and 7-9 won't hire me without a degree or nepotism
5)In the extremely unlikely chance I get hired given the circumstances described, I'll be laid off when the company goes belly up or the shareholders demand a new yacht
> expect me to work 12 hour days with little pay and no benefits
This is not how it should be, but sometimes when you have no other opportunities this is what you need to do to open doors.
Capitalism is vicious and not participating because you (rightfully) find it demeaning is your prerogative, but it is also poor local optimization.
I joined the military and walked with PTSD not because I’m a patriot but because I needed to pay for college. I got out and took an internship well below the poverty line. I turned that into an entry level job, climbed the ladder to build a cushion. Started several side businesses trying to improve my lot. One was successful enough now I can live a comfortable life.
The world sucks but you’ll be eaten alive if you wait for it to become fair. Some obstacles are stepping stones.
Plenty of people do this. Go to New York, SF or LA and you literally can't get a cup of coffee without running into someone that is stuck in a dead end service job, often for years, because they tried what you suggest.
Take risks, sure, but step 0 is have a resourced backup plan, and know when to execute it.
Loads of people do this! It's a big part of why accommodation is so expensive in major cities! It's just that it requires capital upfront, many have a degree filter, and it's still easy for the number of young people to outpace the available jobs.
https://www.hpcwire.com/2026/02/23/why-nvlink-is-nvidias-sec...
reply