If you'd asked me last year to run an autonomous research loop across two GPUs, I'd have said that's not something I can do.
Not "it'll take a while." Impossible.
Today this is my Saturday.
My - - : autonomous hyperparameter mutation, parallel experiments, no babysitting.
Results:
→ 17 experiments, 0 crashes
→ baseline 1.2365 → . _
→ 1.48% improvement. Found by the loop.
The chart shows the staircase, the same pattern Karpathy sees in his runs. His: 2 days, 276 experiments. Mine: 1 hour, 17. Same logic, different constraints.
The constraint here is the ' _=, which means ~5.5% MFU. The GPU is mostly waiting on memory transfers, not computing.
What the same loop looks like on different hardware:
( ): ~ / → .%
: ~+ / → -%
(): ~+ / → -%
(): ~+ / → -%
Same autoresearch loop. Just more runway.
Karpathy's setup, 8x H100 running 48 hours would likely hit 5,000+ experiments. On 4090s that's not feasible. But 24-48 hours on what I have would still find significantly more than 1 hour did. That's what's running next.
This is my own multi-GPU implementation built on top of his single-GPU original, orchestrated with iii functions, workers, and triggers.
Claude Code made the extension possible in a weekend.
Built n-autoresearch - As I saw, in comments on X, there is a need to run agent swarms in parallel on multiple GPUs, hence adding a structured continuation to the autoresearch project by legend Karpathy's, powered by Worker / Function / Trigger as primitives.
What's different:
→ 21 functions, 23 triggers, 8 KV scopes for structured experiment state
→ Polyglot: TypeScript orchestrator + Rust GPU worker
→ Multi-GPU parallel experiments with adaptive search (explore → exploit → combine → ablation)
→ External agents call functions via REST: no LLM baked in, similar to autoresearch itself.
Same val_bpb hill-climbing loop, but with proper state management, crash recovery, near-miss tracking, and structured reporting.
By adding pure iii-sdk worker for json-render UI generation with JSONL patch streaming, caching, rate limiting, and validation. No standalone HTTP server, everything is just endpoints that are iii functions with HTTP triggers served by the iii engine.
Not "it'll take a while." Impossible.
Today this is my Saturday.
My - - : autonomous hyperparameter mutation, parallel experiments, no babysitting.
Results: → 17 experiments, 0 crashes → baseline 1.2365 → . _ → 1.48% improvement. Found by the loop.
The chart shows the staircase, the same pattern Karpathy sees in his runs. His: 2 days, 276 experiments. Mine: 1 hour, 17. Same logic, different constraints.
The constraint here is the ' _=, which means ~5.5% MFU. The GPU is mostly waiting on memory transfers, not computing.
What the same loop looks like on different hardware: ( ): ~ / → .% : ~+ / → -% (): ~+ / → -% (): ~+ / → -%
Same autoresearch loop. Just more runway.
Karpathy's setup, 8x H100 running 48 hours would likely hit 5,000+ experiments. On 4090s that's not feasible. But 24-48 hours on what I have would still find significantly more than 1 hour did. That's what's running next.
This is my own multi-GPU implementation built on top of his single-GPU original, orchestrated with iii functions, workers, and triggers.
Claude Code made the extension possible in a weekend.
If you want to try: https://github.com/iii-hq/n-autoresearch open source repo.