More

apocalypses · 2025-10-10T17:07:26 1760116046

Saying WebGPU “only” adds compute shaders is crazy reductive and misses the point entirely for how valuable an addition this is, from general purpose compute through to simplification of rendering pipelines through compositing passes etc.

In any case it’s not true anyway. WebGPU also does away with the global state driver, which has always been a productivity headache/source of bugs within OpenGL, and gives better abstractions with pipelines and command buffers.

m-schuetz · 2025-10-10T17:16:13 1760116573

I disagree. Yes, the global state is bad, but pipelines, render passes, and worst of all static bind groups and layouts, are by no means better. Why would I need to create bindGroups and bindGroup layouts for storage buffers? They're buffers and references to them, so let me just do the draw call and pass references to the ssbos as argument, rather than having to first create expensive bindings, with the need to cache them because they are somehow expensive.

Also, compute could have easily been added to WebGL, making WebGL pretty much on-par with WebGPU, just 7 years earlier. It didn't happen because WebGPU was supposed to be a better replacement, which it never became. It just became something different-but-not-better.

If you'd have to do even half of all the completely unnecessary stuff that Vulkan forces you to do in CUDA, CUDA would have never become as popular as it is.

charlotte-fyi · 2025-10-10T17:59:18 1760119158

I agree with you in that I think there's a better programming model out there. But using a buffer in a CUDA kernel is the simple case. Descriptors exist to bind general purpose work to fixed function hardware. It's much more complicated when we start talking about texture sampling. CUDA isn't exactly great here either. Kernel launches are more heavyweight than calling draw precisely because they're deferring some things like validation to the call site. Making descriptors explicit is verbose and annoying but it makes resource switching more front of mind, which for workloads primarily using those fixed function resources is a big concern. The ultimate solution here is bindless, but that obviously presents it's own problems for having a nice general purpose API since you need to know all your resources up front. I do think CUDA is probably ideal for many users but there are trade-offs here still.

pjmlp · 2025-10-11T09:41:59 1760175719

It didn't happen because of Google, Intel did the work to make it happen.

apocalypses · on Jan 24, 2025

I massively disagree. It would have taken the author approximately 1 minute to write the following high quality hack-n-slash commit message:

``` Big rewrites

* Rewrote X

* Deleted Y

* Refactored Z ```

Done

jandrewrogers · on Jan 24, 2025

Many times it is “threw everything out and started over” because the fundamental design and architecture was flawed. Some things have no incremental fix.

fooker · on Jan 24, 2025

Different people work differently.

Spending a minute writing commit messages while prototyping something will break my flow and derail whatever I’m doing.

apocalypses · on Nov 14, 2021

There’s address sanitizer on newer versions of visual studio now but in my experience getting it to actually work with all of your projects’ dependencies can be very hit and miss. Windows Debug build config also does a lot more memory checking (part of the reason it’s so slow) so you’re not totally screwed for automated tools.

apocalypses · on Nov 24, 2020

Really interesting article. For me this was an important takeaway:

>Although it should fail gracefully, it does not need to be optimised for failure.

LorenPechtel · on Nov 25, 2020

99% of your code doesn't need optimization, period. If it's not inside at least two loops it almost certainly doesn't meaningfully contribute to the total runtime.

AstralStorm · on Nov 25, 2020

Except the one case where you have a slow approach diffused over a whole application, when all of the code needs an optimization.

That case is easy to hit with compilers or annotation and such features. It is still one thing to optimize but impact is typically big.

nwellnhof · on Nov 25, 2020

That's only true for application code. If you write libraries, you can't know how your library functions will be called and have to assume that most of them could end up in performance-critical code paths.

apocalypses · on Nov 16, 2020

Apple

apocalypses · on Nov 8, 2020

One thing LLVM bitcode still can’t do is retain information about preprocessor directives, eg any platform specific code for AVX2 vs SSE4 etc. So unless you aim to write intrinsic free code it’s usually less performance/reliable to rely on compiler automatic vectorisation, which results in worse codegen overall.

jcranmer · on Nov 8, 2020

> One thing LLVM bitcode still can’t do is retain information about preprocessor directives, eg any platform specific code for AVX2 vs SSE4 etc.

LLVM supports per-function subtarget attributes, so you can compile individual functions with AVX2 support versus SSE4 support. The clang frontend even has a few different ways of triggering this support, with one method allowing you to specify per-CPU dispatch, and another merely specifying target attributes on a per-function basis.

wtallis · on Nov 8, 2020

I know GCC supports generating multiple versions of a function, compiled for different instruction set extensions. And this can also be done manually when you have hand-optimized implementations: https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Common-Functio... It's based on a GNU extension to the ELF format, not preprocessor directives.

I don't think any of that would conflict with using bitcode for everything you don't have a hand-optimized machine-specific implementation of.

apocalypses · on Nov 2, 2020

Though worth noting only for specific AVX-512 CPUs.

Though you can still have vectorised support for regular precision for AVX2.

jedbrown · on Nov 2, 2020

_mm256_rsqrt_ps is 12-bit accuracy in AVX.

https://software.intel.com/sites/landingpage/IntrinsicsGuide...

apocalypses · on Oct 17, 2020

I have my own USB-C crappy story. I have a 2020 MacBook Pro 16” with bootcamp installed. Recently I bought a USB-C hub, if I try to boot into windows (hold alt at Apple logo) the machine literally cannot boot. At all. I have to disconnect the hub in order to make any progress.

apocalypses · on April 16, 2020

I work at a company that does lidar SLAM. You can actually produce really high quality maps/slam with lidar/tof sensors, and it’s a lot more robust/dense than visual/imu mapping.

gdubs · on April 16, 2020

How does the price compare to traditional survey methods?

apocalypses · on April 17, 2020

Depends on what you’re comparing to what. Big spinning lidars on cars are still expensive (few thousand dollars) but are coming down on price. Handheld 3D scanners on the other hand might start to become obseleted by a combination of cheap phone camera + lidar (really tof) sensors that are evidently cheap enough to put on phones. They can actually produce really high quality (SLAM) maps - Apple has figured out that they can have a much faster initial mapping phase by not having to do monocular mapping for their AR. So I guess I’m saying it’s cheaper and a bit worse, but it’s rapidly getting better.

apocalypses · on April 16, 2020

I've had a play around with the new ipad. The big problem is (as far as I can see), absolutely no way for a developer to get access to the underlying depth data.

I'm assuming it's coming in the next version of iOS, because you _can_ get access to the faceid depth data in a useful format.

id_ris · on April 16, 2020

Is it not available through 'AVCaptureDepthDataOutput`? My understanding is that depth data is a separate channel stored in photos.

apocalypses · on April 16, 2020

Maybe I screwed it up, I’m not the best developer ever. I took their sample code to extract the depth data from front camera and this worked - switching to .back caused it to return a nil device, so didn’t really know where to go after that.