This is wrong. Waymo uses both lidar and radar, which Tesla opted to remove. Also, look at Waymo's cars. Those things have sensors strapped all over in ways that give them lots of range and visibility. Tesla will never do that because it compromises the aesthetic.
They're different in practically every way. It's easier to enumerate the things they have in common: NNs borrowed the idea of using a connected network of functions whose outputs feed into each other's inputs.
That's it. That's the total resemblance between the two. The brain isn't just an NN implemented in biology, it has whole systems that aren't accounted for in digital NNs, like hormones and neurotransmitters, and even the system of connected neurons doesn't work the way digital NNs implement it.
Neural networks model the brain exactly as well as objects in OOP model cells: not very well at all. They're inspired by biology, nothing more.
Neural networks as used in AI are inspired by the brain in much the same way that OOP was inspired by the way cells work--neither one is an attempt to faithfully recreate the actual operations of an extremely complex (and only partially understood!) biological system.
The problem with driving in video games is using a keyboard/mouse or controller. Driving with a steering wheel and pedals is pretty easy. Even more so if you have the monitors to give you a realistic field of view.
driving a car in a video game with a steering wheel is easy because it's an experience designed from the ground up with that interface in mind. driving games happily do shit like change the fov to make the user think they're going faster, etc.
being easy to drive with a steering wheel in a driving game, and being easy to drive a real car with a steering wheel (with internet level latency and packet loss at play, mind you) are very different things
Pretty much all simulator games have to cheat. Because most of us can't hop into a Formula 1 race car or the cockpit of an F-22 and drive/fly it with no real training. (Much less race other drivers or fight in a dogfight.)
>being easy to drive with a steering wheel in a driving game, and being easy to drive a real car with a steering wheel (with internet level latency and packet loss at play, mind you) are very different things
Sure, but I don't think it's that different than driving a real car looking out a monitor sized windshield. You might lose some braking and cornering without your inertial perceptions. But you'd still be able to driver around easily.
Either way I think a speedometer is an assumed input for a self driving car and from that you can calculate almost everything needed related to proprioception.
Traction control software uses touch (knowing how much grip tires have via slip), proprioception (knowing the steering angle), equilibroperception (accelerometers).
If you are referring to humans, we have a couple orders of magnitude more connections, our neurons can achieve the same functions at 1 order of magnitude fewer numbers, and our brain has better inductive biases.
By which metrics are human eyes outperforming “any camera that can be put in a car today”? This seems unlikely in almost any domain, given the much wider dynamic range camera sensors can capture - cameras can see into spectrum we simply can’t (infrared, UV…) and operate at much lower levels of light than a human eyeball while retaining full color vision using really cheap tech. They also don’t get tired or worse with age, or forget to wear their glasses, which is nice.
This strikes me as a pretty odd statement to make, personally!
“There is no real comparison” - for the benefit of the less informed, please make the comparison, assuming you are able.
The human eye can perceive 21 stops dynamic range, much better than regular cameras. Event cameras might solve that issue, but they're not used other than in research at the moment.
Maybe in a single still capture? Let’s not forget cameras can easily combine multiple exposures into a single capture to substantially increase dynamic range, and can do so at high frame rates, and can go well beyond 21 stops in doing so. The human eye is stuck with the same ~21 stop range regardless.
If you use a pair of digital sensors with a 15 stop exposure offset between them (seems fair - humans have two…), thats ~30 stops in a single shot if we assume best we get is 15 from a digital sensor. Again though with high-speed exposure blending and one sensor this is not really necessary in a lot of cases.
The practical reality is digital capture can exceed 21 stops and you don’t need particularly fancy equipment to do it. Two decent cellphone grade sensors (~12-14 stops) would be enough if you don’t want to do single sensor blends and would work well for real-time video applications.
They can also "see" much further than the average human can [1].
The only way I could imagine that we are superior to phone cameras is stabilisation, something that could be resolved with vertical integration that informs the sensors and image processing units about forces being applied to the vehicle (though this is coming from somebody outside the field so take it with many grains of salt).
The biggest difference between human vision and cameras is the fovea. Half of our optic nerves are concentrated in the visual area the size of our thumbnail with an outstretched arm. To replicate human vision you have to have a high resolution camera, downsample the image and then grant the AI access to high resolution imagery when requested.
The claim was impossibility with the sensor suite. It may well be impractical. In the long run, there's no better way to be wrong than claiming impossibility.
In which case he’s even more wrong. Humans use a whole mess of different senses while driving, including hearing, the inertial sensitivity in the inner ear, touch to feel vibrations and from the car and the wheels on the road. Plus we have a huge amount of contextual information about the meaning of what we are seeing from life experience outside driving, which no Tesla that currently exists can ever have.
It’s a clever bit of snark, but absurdly wide of the mark. If that’s actuary what the Tesla engineers think, no wonder they’re failing by their own criteria so completely.
The claim isn't that humans can drive as well using the Tesla cameras as they can in person, just that they can. That seems obviously true.
The (not explicitly made) sub-claim is that an AI can make up for the lack of audio, etc. by being smarter than a human and faster than a human, better able to multi-task than a human, and completely non-distractible. That's debatable, but not impossible.
...And furthermore, the neural nets and cameras Tesla uses are vastly inferior to our brains. Just because you can argue that a neural network of some kind uses the same basic structures as our brains doesn't mean that it can come within a light-year of what our brains can actually do.
Edit: I'd love to know why I'm being downvoted. Tesla cars guess depth with a neural net. Humans have the hardware for getting this data directly. Unless you either have lidar, radar, or dedicated stereoscopic cameras, you don't have real/accurate depth data. And depth data like that stops your car from plowing into white trucks.
This is true. But most of the things perceived while driving are outside of the stereoscopic depth perception ability of humans. IIRC that stops around 20 meters or so.
Unless I've been driving very differently from you, most of the things that I care about the precise distance of are well within 20 meters.
And moreover, it's lucky at best if my Model 3 acknowledges a car 20m away on the preview. It struggles with cars a few feet from me on the diagonal at stoplights.