> obviously created by people the care deeply about the quality of the product they produce
This obviously doesn't represent all of the billions of dollars spent on software like Salesforce, SAP, Realpage, Booking.com, etc. etc. (all notoriously buggy, slow, and complex software). You can't tell me with a straight face that all of the thousands of developers who develop these products/services care deeply about the quality of the product. They get real nice paychecks, benefits and put dinner on the table for their families. That's the market.
> There is no substitute for high quality work.
You're right because there really isn't a consistent definition of what "high quality" software work looks like.
> This obviously doesn't represent all of the billions of dollars spent on software like Salesforce, SAP, Realpage, Booking.com, etc. etc. (all notoriously buggy, slow, and complex software). You can't tell me with a straight face that all of the thousands of developers who develop these products/services care deeply about the quality of the product. They get real nice paychecks, benefits and put dinner on the table for their families. That's the market.
Those first three are "enterprise" or B2B applications, where the person buying the software is almost never one of the people actually using the software. This disconnect means that the person making the buying decision cannot meaningfully judge the quality of any given piece of software they are evaluating beyond a surface level (where slick demos can paper over huge quality issues) since they do not know how it is actually used or what problems the actual users regularly encounter.
Users care about quality, even if the people buying the software do not. You can't just say "well the market doesn't care about quality" when the market incentives are broken for a paricular type of software. When the market incentives are aligned between users and purchasers (such as when they are the same person) quality tends to become very important for the market viability of software (see Windows in the consumer OS market, which is perceptibly losing share to MacOS and Linux following a sustained decline in quality over the last several years).
You literally just told me the market doesn't care about quality. I don't get what point you're trying to make?
> When the market incentives are aligned between users and purchasers (such as when they are the same person) quality tends to become very important for the market viability of software
Right, but this magical market you're talking about doesn't exist. That's my point.
Have you seen Facebook's code quality? Have you seen any-big Chinese corpo code? There are a lot of very profitable businesses in the world with endless amount of tech debt. But tech debt is not necessarily a big deal in most scenarios. Obviously I'm not talking about mission critical software, but for general consumer/business software, it's fine. The hard part is understanding where you can cut the costs / add debt, and that comes from requirement gathering.
> You can't tell me with a straight face that all of the thousands of developers who develop these products/services care deeply about the quality of the product.
What about caring and being depressed because quality comes from systems rather than (just) individuals?
I couldn't book travel at a previous company because my address included a `.`, which passed their validation. Awful, awful software. I wouldn't expect slop code to improve it.
Now imagine how much they would make if their software was good.
Google, Facebook, Apple clearly care deeply about the quality of their code. They have to because bugs, bad performance, outages, vulnerabilities have very direct and immediate costs for them. I know Amazon and Microsoft have their critics but I bet they are also better than we give them credit for.
There are factors besides software quality that affect their success. But running bad software certainly isn’t going to help.
>Google, Facebook, Apple clearly care deeply about the quality of their code.
Yea, idk about that one.
They definitely did care in the past. They had to if they wanted to get users. But they've stopped caring a good while ago. Especially Microsoft. The costs that bad code would bring them is lower than the cost of developping good code, because they can mostly rely on monopolies and anti-competitive practices for user retention. Their users are more like hostages than anything else.
Is Google much better? I don't see, for example, the care that used to go into the quality of organic search results.
They seem fine with the output of the current hodge-podge of the original algorithm results plus massaging by many downstream ML pipelines that run one after the other without context of how each stop might affect the next.
> You're right because there really isn't a consistent definition of what "high quality" software work looks like.
And if you can deterministically define "high quality software" with linters, analysers etc - then an AI Agent can also create high quality software within those limits.
> With a range of ~1000km this seems to crush these results
The 1000km range likely has more to do with the efficiency of the drivetrain and the aerodynamics of the car more than the battery tech. kWh is an absolute value that is fungible and the Denza has a 122.5 kWh battery pack, which means its getting 5mi/kWh. For perspective my Rivian R1S gets ~350 miles on a 135 kWh pack which is about 2.5mi/kWh (so about half that)
The only part of the battery tech that could affect range is the weight. Sodium batteries are typically much heavier than Li-on. I believe the Denza uses LFP, which means it's likely somewhere else on the car that they're gaining improvement in the range - not from the battery tech. That being said, the battery tech definitely affects the charge/discharge rates.
> The only part of the battery tech that could affect range is the weight.
Weight is a pretty low factor for cars, sub-percent (aging wheels did a comparison using a pickup empty versus loaded with a pallet of shingles, though with a more efficient vehicle the influence of weight probably shows up more).
Energy density (amount of energy per unit of volume) is a much bigger factor than energy specificity (amount of energy per unit of mass), it means you can either cram more energy in the same volume for more range, or have a lower vehicle with better aero.
Sodium-ion batteries will always be heavier than the best lithium-ion batteries, but for now they have the same energy per kilogram with LFP batteries.
So they have 2 essential advantages over LFP, retention of capacity to much lower temperatures and their cost will become significantly lower when their production technology will be more mature, because they not only do not use lithium, but they also do not use other expensive substances, e.g. nickel or cobalt.
> The only part of the battery tech that could affect range is the weight.
Doesn't the charging speed affect how much regenerative braking can be done? If you have to stop fast enough or the battery is sufficiently hot/full/etc. then one that can't charge as fast requires more of the energy to be lost.
Not really, even at “low” charging speeds you have more than enough braking.
Braking is the reverse of accelerating so the rate is about the same for the same acceleration (positive or negative).
It’s really only if the battery is extremely close to full that this is a potential issue, and that’s assuming the manufacturer either has little to no buffer, or didn’t take this in account and won’t regen into (some of) the buffer.
> Braking is the reverse of accelerating so the rate is about the same for the same acceleration (positive or negative).
If the battery is hot and you want to accelerate, increasing the 0-60 time from 3 seconds to 10 seconds isn't a problem for ordinary usage. If the battery is hot and you want to stop, increasing the stopping time isn't acceptable so the car is going to use the friction brakes instead.
Ok, but the Rivian R1S is a particularly inefficient EV (2-2.5 mi/kWh = 31-25 kWh/100 km). 12.5 kWh/100 km is efficient but not outlandishly so considering these are likely CLTC ranges, which are higher than WLTP which are higher than EPA, and the car in question is not in fact a dumptruck.
> You need to think in terms of a probability of a successful hallucination or prompt injection.
I would venture to say that an ACID compliant deterministic database has a 99.999999999999999999% chance of retrieving the correct information when asked by the correct SQL statement. An LLM on the other hand is more like 90%. LLMs by their innate code instruction are meant to hallucinate. I don't necessarily disagree with your sentiment, but the gap from 90% to 99.999999999999999999% is much greater of than the 0% to 90% improvement...unless something materially changes about how an LLM works at the bytecode level.
> The whole point of OpenClaw is to run AI actions with your own private data, your own Gmail, your own WhatsApp, etc. There's no point in using OpenClaw with that much restriction on it.
Hard disagree. I have OpenClaw running with its own gmail and WhatsApp running on its own Ubuntu VM. I just used it to help coordinate a group travel trip. It posted a daily itinerary for everyone in our WhatsApp group and handled all of the "busy work" I hate doing as the person who books the "friend group" trip. Things like "what time are doing lunch at the beach club today?" to "whats the gate code to get into the airbnb again?"
My next step is to have it act on my behalf "message these three restaurants via WhatsApp and see which one has a table for 12 people at 8pm tonight". I'm not comfortable yet to have it do that for me but I'm getting there.
Point is, I get to spend more valuable time actually hanging out and being present with my friends. That's worth every dollar it costs me ($15/month Tmobile SIM card).
I believe you only need a unique phone number to create the account, then you can use WhatsApp Web as client. Be very careful with alternative clients, as I've had an account banned in the past for this (and therefore a phone number blacklisted), even without messaging anybody. I think that clients that run WhatsApp Web in a web view (like https://github.com/rafatosta/zapzap) are safe.
I think they started banning unauthorized API users around the time that "WhatsApp For Business" was introduced, because it was competing with that product. Unfortunately WhatsApp For Business is geared toward physical products and services with registered companies, so home automation and agents are left with no options.
I believe you can use a virtual number/VOIP (like Twilio or Google Voice), but I want to be able to eventually use SMS where WhatsApp can't be used, so I do know some services identify "non residential" SMS phone numbers (for example I've seen Google Voice numbers blocked) so I wanted to prevent that from happen. Again, key thing here for me is that my assistant appears to be a human.
The number of times i realized half way that I probably posted the wrong password and so I vigorously type the 'delete' key to reset the input is too damn high
The Just in that sentence is wholly unjustified. There are plenty of cli/tui/console/shell shortcuts that are incredibly useful, yet they are wholly undiscoverable and do not work cross-platform, e.g. shell motions between macOS and reasonable OSes.
All the movement commands I know work the same in the terminal on a default install of macOS as it does in the terminal on various Linux distros I use.
Ctrl+A to go to beginning of line
Ctrl+E to go to end of line
Esc, B to jump cursor one word backwards
Esc, F to jump cursor one word forward
Ctrl+W to delete backwards until beginning of word
And so on
Both in current versions of macOS where zsh is the default shell, and in older versions of macOS where bash was the default shell.
Am I misunderstanding what you are referring to by shell motions?
Yea, but ctrl + arrows to move cursor between ‘words’ don’t work, especially sad when SSH’ing in from linux. It works fine when using terminal on macOS - you just use command + arrows.
It's built into the Unix terminal driver. Control-U is the default, but it can be changed with e.g. "stty kill". Libraries like readline also support it.
I have had a similar issue where I thought my computer went to sleep so I start typing my password while the monitor wakes up only to realize that it was only the screen that turned off and the computer was already unlocked so when I hit enter the password was sent into a slack thread or dm instead
But yeh, never thought this was a problem anyone else delt with. My passwords are all a variant of my on "master password" and sometimes forget which session I'm in so trying to save keystrokes, count backward to where I think the cursor should be.
LLMs by their nature are not goal orientated (this is fundamental difference of reinforcement learning vs neural networks for example). So a human will have the, let's say, the ultimate goal of creating value with a web application they create ("save me time!"). The LLM has no concept of that. It's trying to complete a spec as best it can with no knowledge of the goal. Even if you tell it the goal it has no concept of the process to achieve or confirm the goal was attained - you have to tell it that.
I don't know, I would assume it works but I would not expect it to be free of bugs. But that is the baseline for code, being correct - up to some bugs - is the absolute minimum requirement, code quality starts from there - is it efficient, is it secure, is it understandable, is it maintainable, ...
So do you expect it not to be free of bugs because you've run a comprehensive test on it, read all of the code yourself or are you just concluding that because you know it was generated by an LLM?
It has not been formally verified which is essentially the only way to achieve code without defects with reasonable confidence. There are several studies that have found that there are roughly between one and twenty bugs per thousand lines of code in any software, this project has several thousand lines of code, so I would expect several bugs if written by humans and I have no reason to assume that large language models outperform humans in this respect, not at last because they are trained on code written by humans and have been trained to generate code as written by humans.
But you said "it's not great code" and then said "i don't know", so your idea of it being "not great code" is purely speculative and totally unfounded.
No, my judgment of not great code is not based on what the code does - and if it does so correctly - but on how the code is written. Those are independent things, you can have horrible code that does what it is supposed to do but you can also have great code that just does the wrong thing [1].
[1] I would however argue the later thing is more rare as it requires competent developers, however this still does not preclude some misunderstanding of the requirements.
It works really well, multiple people have been using it for a month or so (including me) and it's flawless. I think "not great" means "not very readable by humans", but it wasn't really meant to be readable.
I don't know if there are underlying bugs, but I haven't hit any, and the architecture (which I do know about) is sane.
He was a partner at YC for 8 years
He has no research/PhD background in AI and is the CEO of an AI company
There is no objective data point in which he's a better CTO than a CEO
reply