Hacker Newsnew | past | comments | ask | show | jobs | submit | Snuggly73's commentslogin

Just to nitpick the math. If you are going to fire 50% of the company, the AI tools should actually make the remaining people 100% more efficient, not 50% :)


And if you kept everyone and used AI you could expand the business. Oh wait, they are out of ideas.


it has been pretty much a benchmark for memorization for a while. there is a paper on the subject somewhere.

swe bench pro public is newer, but its not live, so it will get slowly memorized as well. the private dataset is more interesting, as are the results there:

https://scale.com/leaderboard/swe_bench_pro_private


I mean, its right there in their blog - https://cursor.com/blog/scaling-agents

"We've deployed trillions of tokens across these agents toward a single goal. The system isn't perfectly efficient, but it's far more effective than we expected."


if it’s too hard for you to write, it’s too hard for you to understand and comprehend. how are you going to take responsibility for that code and maintain it if needed?


Well, could it be because it was instructed to kinda "study" Servo?

https://github.com/wilsonzlin/fastrender/blob/3e5bc78b075645...


I've watched them today work in the new repo - https://github.com/wilson-anysphere/fastrender/tree/main , adding another 50k lines trying to optimize scroll/rendering performance (spoiler: not really)

At this point, its 1.5mlocs without the vendored crates (so basically excluding the js engine etc). If you compare that to Servo/Ladybird which are 300k locs each and actually happen to work, agents do love slinging slop.


Not sure - if it works, then who needs Cursor (and all other IDEs). You just ask for a browser and it comes out of the thin air.


This is from the "official" build - https://imgur.com/fqGLjSA

The "in progress" build has a slightly different rendering but the same result


Yeah, it's not executing any JavaScript. Hey Mr. Wilson! You've spent millions creating this worthless slop. How about making sure that the code is actually being executed? Or is that not necessary to raise millions more in VC funding?


The latest commit now builds and runs (at least on my Mac). It’s tragically broken and the code is…dunno…something. 3m lines of something.

I couldn’t make it render the apple page that was on the Cursor promo. Maybe they’ve used some other build.


Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.

Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even some commits made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...


Noticed that as well - I think it was “manual”


I am not an expert AI user, but one typical 'failure mode' I see constantly is the AI reimplementing features that already exist in the codebase, or breaking existing ones.


And there is the thing about the cost. The blog post says that they've spent trillions (plural!) of tokens on that experiment.

Looking at OAI API pricing, 5.2 Codex is $14 per 1 million output tokens. Which makes cool $14m for 1 trillion tokens (multiplied by whatever the plural is). For something that "kind of works".

Its a nice ad for OAI and Anysphere, but maybe next time - just donate the money to a browser team?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: