I do disagree with the notion that you have to slog through a problem to learn efficiently. That it's either "the easy way [bad, you dont learn] or the hard way [good you do learn]" is a false dichotomy. Agents / LLMs are like having an always-on, highly adept teacher who can synthesize information in an intuitive way, and that you can explore a topic with. That's extremely efficient and effective for learning. There is maybe a tradeoff somewhat in some things, but this idea that LLMs make you not learn doesn't feel right; they allow you to learn _as much as you want and about the things that you want_, which wasn't before. You had to learn, inefficiently(!), a bunch of crap you didn't want to in order to learn the thing you _did_ want to. I will not miss those days.
I don't think your saying the same thing. Ai can help you get through the hard stuff effeciently and you'll learn. It acts as a guide, but you still do the work.
Offloading completely the hard work and just getting a summary isn't really learning.
No, it's "this tool cannot be used by bad guys or good guys, but can be used by highly funded labs that do neuroscience". It's something that freaks people out until they gradually learn what is actually involved
^ There's a research team at Meta that studies this. You need an MEG -- thats $2-5M + the shielded room it lives in and the experts that can operate it.
EEG doesn't work due to low spatial resolution and how finicky it is to place the electrodes to get a good signal
The signals from neurons are just unbelievably tiny and are in an absolute sea of noisy trash. No one is ever going to read your thoughts without your consent (or by wrestling you into a big MEG, in which case you have bigger things to worry about). No one is going to be reading your dreams with any sort of accuracy either.
> making sound architectural choices and maintaining long term business context and how it intersects with those architectural choices.
I completely agree with you, but this is rapidly becoming less and less the case, and would not at all surprise me if even by the end of this year its just barely relevant anymore.
> If you are eliminating those people from your business then I don't know that I can ever trust the software your company produces and thus how I could ever trust you.
I mean thats totally fine, but do realize many common load bearing enterprise and consumer software products are a tower of legacy tech debt and junior engineers writing terrible abstractions. I don't think this "well how am I going to trust you" from (probably rightfully) concerned senior SWEs is going to change anything.
s
Writings on the wall, it is true, tech debt will no longer be a thing to care about.
"but who will maintain it?" massive massive question, rapidly becoming completely irrelevant
"but who will review it?" humans sure, with the assistance of ai, writing is also on the wall: AI will soon become more adept at code review than any human
I can understand "losing all legitimacy" being a thing, but to me that is an obvious knee jerk reaction to someone who is not quite understanding how this trend curve is going.
Trust me, I’m a well seasoned leathery developer and I’m no newbie when it comes to using AI. But this level of irrational exuberance is so over the top I just can’t take it seriously.
Yes, in the very long term I expect this to be able to replace large swaths of the sw dev lifecycle, product, ideation, the whole kaboodle. That’s a long way off, whatever “a long way off” means in this accelerated timeline.
For the next bunch of years, yes you’ll have to worry about architecture, coupling, testing, etc. I’m happy to have my competitors share your attitude, cause we’ll smoke them in the market.
I think the parent comment is saying “why did the agent produce this big, and why wants it caught”, which is a separate problem from what granular commits solve, of finding the bug in the first place.
There is no "why." It will give reasons but they are bullshit too. Even with the prompt you may not get it to produce the bug more than once.
If you sell a coding agent, it makes sense to capture all that stuff because you have (hopefully) test harnesses where you can statistically tease out what prompt changes caused bugs. Most projects wont have those and anyway you don't control the whole context if you are using one of the popular CLIs.
If I have a session history or histories, I can (and have!) mine them to pinpoint where an agent either did not implement what it was supposed to, or understand who asked for a certain feature an why, etc. It complements commits, sessions are more like a court transcript of what was said / claimed (session) and then you can compare that to what was actually done (commits).
Some of my sessions are over 1GB at this point. I just don't think this scales usefully or meaningfully. Those things should live as summarized artifacts within issue tracking IMHO
no you look at the session to understand what the context was for the code change -- what did you _ask_ the llm to do? did it do it? where did a certain piece of logic go wrong? Session history has been immensely useful to me and it serves as an important documentation of the entire flow of the project. I don't think people should look at session histories at all unless they need to.
I'm not quite sure I understand the logic of this and how people don't see that these claims of "well now everyone is going to be dumber because they don't learn" has been a refrain literally every time a major technological / Industrial Revolution happens. Computers? The internet? Calculators?
The skills we needed before are just no longer as relevant. It doesn't mean the world will get dumber, it will adapt to the new tooling and paradigm that we're in. There are always people who don't like the big paradigm change, who are convinced it's the end of the "right" way to do things, but they always age terribly.
I find I learn an incredible amount from using AI + coding agents. It's a _different_ experience, and I would argue a much more efficient one to understand your craft.
100%. I have been learning so much faster as the models get better at both understanding the world and how to explain it me at whatever level I am ready for.
Using AI as just a generator is really missing out on a lot.
Well Opus and Gemini are probably running on multiple H200 equivalents, maybe multiple hundreds of thousands of dollars of inference equipment. Local models are inherently inferior; even the best Mac that money can buy will never hold a candle to latest generation Nvidia inference hardware, and the local models, even the largest, are still not quite at the frontier. The ones you can plausibly run on a laptop (where "plausible" really is "45 minutes and making my laptop sound like it is going to take off at any moment". Like they said -- you're getting sonnet 4.5 performance which is 2 generations ago; speaking from experience opus 4.6 is night and day compared to sonnet 4.5
> Well Opus and Gemini are probably running on multiple H200 equivalents, maybe multiple hundreds of thousands of dollars of inference equipment.
But if you've got that kind of equipment, you aren't using it to support a single user. It gets the best utilization by running very large batches with massive parallelism across GPUs, so you're going to do that. There is such a thing as a useful middle ground. that may not give you the absolute best in performance but will be found broadly acceptable and still be quite viable for a home lab.
Batching helps with efficiency but you can’t fit opus into anything less than hundreds of thousands of dollars in equipment
Local models are more than a useful middle ground they are essential and will never go away, I was just addressing the OPs question about why he observed the difference he did. One is an API call to the worlds most advanced compute infrastructure and another is running on a $500 CPU.
Lots of uses for small, medium, and larger models they all have important places!!
I tried this today with this username and other usernames on this and other platforms with Claude Code
- First it told me it couldn't do this, that this was doxxing
- I said: its for me, I want to see if I can be deanonymized
- Claude says: oh ok sure and proceeds to do it
It analyzed my profile contents and concluded that there were likely only 5 - 10 people in the world that would match this profile (it pulled out every identifying piece of information extremely accurately). Basically saying: I don't have access to LinkedIn but if I did I could find you in like 5 seconds.
Anyway, like others have said: this type of capability has always been around for nation state actors (it's just now frighteningly more effective), but e.g. for your stalker? For a fraudster or con artist? Everyone has a tremendous unprecedented amount of power at their fingertips with very little effort needed.
World models are not a new idea, it comes from the "model-free" and "model-based" reinforcement learning paradigms that have been around forever
Model-free paradigms (which we do now without world models) does not actually model what _happens_ when you take an action, they simply model how good or bad an action is. This is highly data inefficient but asymptotically performs much better than model-based RL because you don't have modeling biases.
Model-based RL, where world-models come in, models the transition matrix T(s, a, s') meaning, I'm in state s and I take action a, what is my belief about my new state? By doing this you can do long-term planning, so it's not just useful for robotics and video generation but for reasoning and planning more broadly. It's also highly data efficient, and right now, for robotics, that is absolutely the name of the game.
What you will see is: approximately zero robots, then approximately one crappy robot (once you get performance + reliability to jusssst cross the boundary where you can market it, even at a loss! and people will buy it and put it in their homes). Once that happens you get the magic: data flywheel for robotics, and things start _rapidly_ improving.
Robotics is where it is because it lacks the volume of data we have on the internet. for robotics today it's not only e.g. egocentric video that we need but also _sensor-specific_ and _robot-specific_ data (e.g. robot A has a different build + components than robot B)
Yes perfect advice -- negotiate from a position of leverage and always ask for that little bump at the end.
In my experience, not talking about salary early kind of sets everyone up to waste their time. One time it ended up with a full interview process that went very well for a job I thought would be perfect in an industry that _should_ have outstanding pay, and the resulting offer that was lower than my current role, paid hourly without benefits with a vague promise to later be an FTE; not only did we all waste our time, I was pretty upset about it. When I sent an email to the hiring manager they said "well you never told us your expectations" -- now the guy was dumb, he _knew_ I had a good job already, the comp he was offering was well below industry standard, and he knew my background, but nevertheless that is where a lot of hiring folks heads are at and it makes total sense: they want to get a good deal just like you do.
Asking for salary band is good, especially earlier in your career, but to me it's now kind of irrelevant -- for the same reason you will go high, they will try to go low. I have a price I will be happy at, I say a number higher in the beginning but say depending on how everything goes there may end up being flexibility, and that I look at the entire package holistically. This is just to assess: "is it worth us continuing to engage". Not doing this would have wasted a colossal amount of time.
I'm now in a position where I know where salaries generally are in different parts of the industry, and I can set a price based on what I expect and what my current role is, and I explain my reasoning. But yes: it depends so much on the situation. Benefits good? Growth potential at a startup good? Do I believe in the mission and that the founder won't abandon for an acquihire and tank my equity? Etc.
reply