Just to nitpick the math. If you are going to fire 50% of the company, the AI tools should actually make the remaining people 100% more efficient, not 50% :)
it has been pretty much a benchmark for memorization for a while. there is a paper on the subject somewhere.
swe bench pro public is newer, but its not live, so it will get slowly memorized as well. the private dataset is more interesting, as are the results there:
"We've deployed trillions of tokens across these agents toward a single goal. The system isn't perfectly efficient, but it's far more effective than we expected."
if it’s too hard for you to write, it’s too hard for you to understand and comprehend. how are you going to take responsibility for that code and maintain it if needed?
At this point, its 1.5mlocs without the vendored crates (so basically excluding the js engine etc). If you compare that to Servo/Ladybird which are 300k locs each and actually happen to work, agents do love slinging slop.
Yeah, it's not executing any JavaScript. Hey Mr. Wilson! You've spent millions creating this worthless slop. How about making sure that the code is actually being executed? Or is that not necessary to raise millions more in VC funding?
Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.
Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even some commits made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
I am not an expert AI user, but one typical 'failure mode' I see constantly is the AI reimplementing features that already exist in the codebase, or breaking existing ones.
And there is the thing about the cost. The blog post says that they've spent trillions (plural!) of tokens on that experiment.
Looking at OAI API pricing, 5.2 Codex is $14 per 1 million output tokens. Which makes cool $14m for 1 trillion tokens (multiplied by whatever the plural is). For something that "kind of works".
Its a nice ad for OAI and Anysphere, but maybe next time - just donate the money to a browser team?