Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Personally, I'm disappointed that there is too much superstitions and assumptions about GC designs going around, which lead to the discussion like this one that treats specialized designs with explicit tradeoffs, which both Go and BEAM are, as universally superior options. Or discussions that don't recognize that GC is in many scenarios an optimization over malloc/free and a significant complexity reduction over explicit management of arenas.

When it comes to Java - it has multiple GC implementations with different tradeoffs and degree of sophistication. I'm not very well versed in their details besides the fact that pretty much all of them are quite liberal with the use of host memory. So the way I approach it is by assuming that at least some of them resemble the GC implementation in .NET, given extensive evidence that under allocation-heavy scenarios they have similar (throughput) performance characteristics.

As for .NET itself, in server scenarios, it uses SRV GC which has per-core heaps (the count is sizing is now dynamically scalable per workload profile, leading to much smaller RAM footprint) and multi-threaded collection, which lends itself to very high throughput and linear scaling with cores even on very large hosts thanks to minimal contention (think 128C 1TiB RAM, stometimes you need to massage it with flags for this, but it's nowhere near the amount of ceremony required by Java).

Both SRV and WKS GC implementations use background collection for Gen2, large and pinned object heaps. Collection of Gen0 and Gen1 is pausing by design as it lends itself for much better throughput and pause times are short enough anyway.

They are short enough that modern .NET versions end up having better p99 latency than Go on multi-core throughput saturated nodes. Given decent enough codebase, you only ever need to worry about GC pause impact once you go into the territory of systems with hard realtime requirements. One of the better practical examples of this that exists in open source is Osu! which must run its game loop 1000hz - only 1ms of budget! This does pose challenges and requires much more hands-on interaction with GC like dynamically switching GC behaviopr depending on scenario: https://github.com/dotnet/runtime/issues/96213#issuecomment-... This, however, would be true with any language with automatic memory management, if it's possible to implement such a system in it in the first place.



I'm not gonna say it with 100% certainty, but having started using C# a few years ago i think that much of the smaller latency is simply because C# probably produces a magnitude less garbage than Java in practice (probably less than Go as well). The C# GC's generally resemble the older Java GC's more than G1 or especially the Zgc collector(that includes software based read-barriers whilst most other collectors only use write-barriers).

Small things like tuples (combining multiple values in outputs) and out parameters relaxes the burden on the runtime since programmers don't need to create objects just to send several things back out from a function.

But the real kicker probably comes since the lowlevel components, be it the Http server with ValueTasks and the C# dictionary type getting memory savings just by having struct types and proper generics. I remember reading some article from years ago about re-writing the C# dictionary class that they reduced memory allocations by something like 90%.


When you say "C# GC's generally resemble the older Java GC's more than G1 or especially the Zgc", what do you have in mind? I'm skimming through G1 description once again and it looks quite similar to .NET's GC implementation. As for Zgc, introducing read barriers is a very strong no in .NET because it introduces additional performance overhead to all the paths that were previously free.

On allocation traffic, I doubt average code in C# allocates less than Go - the latter puts quite a lot of emphasis on plain structs, and because Go has very poor GC throughput, the only way to explain tolerable performance in the common case is that Go still allocates less. Of course this will change now that more teams adopt Go and start classic interface spam and write abstractions that box structs into interfaces to cope with inexpressive and repetition-heavy nature of Go.

Otherwise, both .NET and Java GC implementations are throughput-focused, even the ones that target few-core smaller applications, while Go GC focuses on low to moderate allocation traffic on smaller hosts with consistent performance, and regresses severely when its capacity to reclaim memory in time is exceeded. You can expect from ~4 up to ~16-32x and more (SRV GC scales linearly with cores) difference in maximum allocation throughput between Go and .NET: https://gist.github.com/neon-sunset/c6c35230e75c89a8f6592cac...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: