Agreed. The ARM AGI CPU supports a newer version of the vectorized instructions and has matrix math extensions that the AmpereOne M doesn’t. Also has almost twice the memory bandwidth. One paper at least, the AGI CPU seems like a better choice for AI workloads. Ampere is really pushing the AI workload use cases for the AmpereOne M, so this really makes their lives a lot harder.
Neoverse V3 is better than any core used or designed by Ampere, but it does not have matrix math extensions.
Neoverse V3 is the server version of Cortex-X4 and it is an Armv9.2-A CPU, with SVE2, but without SME/SME2.
The matrix math extensions, i.e. SME/SME2, are present only in the latest generation of Arm cores (the C1 cores), which implement the Armv9.3-A ISA, and also in recent Apple cores.
Neoverse V3 is also used in AWS Graviton5 and in a few NVIDIA products, e.g. in Thor, and it is also pretty much equivalent with the Skymont and Darkmont Intel E-cores, which are used in the Lunar Lake, Arrow Lake, Panther Lake and Clearwater Forest CPUs.
I own one of these systems. My interpretation is the Ampere systems are targeted at lower cost scale out. The Ampere Altra CPUs are limited to DDR4. The raw single core performance doesn’t match Intel or AMD offerings. You get a lot of cores for a lower hardware cost and at lower energy usage.
The Nvidia CPUs are designed for a very specific use case. They are designed for high performance with less concern about cost control.
The newer AmpereOne CPUs use DDR5 with the AmpereOne M supporting even higher memory bandwidth. Even then, I doubt the AmpereOne CPUs will match the performance of the Nvidia Rubin CPUs. But the Ampere processors are available for general use. I am guessing that Nvidia is only going to sell the complete rack system and only to high-volume customers.
Since you are very focused on specific Nvidia hardware, I wonder if Nvidia would either buy you out to benefit from your tech or implement their own version without your involvement. Seems risky to me as a potential customer.
I used to think of D as the same category as C# and Java, but I realized that it has two important differences. (I am much more experienced with Java/JVM than C#/.Net, so this may not all apply.)
1. Very load overhead calling of native libraries. Wrapping native libraries from Java using JNI requires quite a bit of complex code, configuring the build system, and the overhead of the calls. So, most projects only use libraries written in a JVM-language -- the integration is not nearly as widespread as seen in the Python world. The Foreign Function and Memory (FFM) API is supposed to make this a lot easier and faster. We'll see if projects start to integrate native libraries more frequently. My understanding is that foreign function calls in Go are also expensive.
2. Doesn't require a VM. Java and C# require a VM. D (like Go) generate native binaries.
As such, D is a really great choice when you need to write glue code around native libraries. D makes it easy, the calls are low overhead, and there isn't much need for data marshaling and un-marshaling because the data type representations are consistent. D has lower cognitive overhead, more guardrails (which are useful when quickly prototyping code), and a faster / more convenient compile-debug loop, especially wrt to C++ templates versus D generics.
Native calls from C# are MUCH better than then the Java experience. It's a massive part of why I chose it when it came out. Today, C# is pretty great... not ever MS dev shop, which is often, like Java, excessively complex for its' own sake.
On #2, I generally reach for either TS/JS with Deno if I need a bit more than a shell script, or Rust for more demanding things. I like C# okay for the work stuff that I do currently though.
what are you referring to regarding Java? I'm aware C# has AOT (and il2cpp for Unity projects) but I don't recall hearing about any sort of Java native binary that isn't just shipping a VM and java bytecode (ignoring the short-lived GNU java compiler).
Java has had AOT compilers since around 2000, they only happened to be commercial, Excelsior JET was the most famous one.
There were several vendors selling AOT compilers for embedded systems, nowadays they are concentrated into two vendors, PTC and Aicas.
Then you have the free beer compilers GraalVM and OpenJ9, which are basically the reason why companies like Excelsior JET ended up closing shop.
Also .NET has had many flavours, starting with NGEN, Mono AOT, Bartok, MDIL, .NET Native, and nowadays Native AOT.
Both ecosystems are similar to Lisp/Scheme nowadays, having a mix of JIT and AOT toolchains, each with its plus and minus, allowing the developers to pick and choose the best approach for their deployment scenario.
The ACM Recommender Systems conference is one of the leading venues in the field. You might check out what papers were accepted for the 2024 and 2025 conferences:
This seems like a great way to group semantically-related statements, reduce variable leakage, and reduce the potential to silently introduce additional dependencies on variables. Seems lighter weight (especially from a cognitive load perspective) than lambdas. Appropriate for when there is a single user of the block -- avoids polluting the namespace with additional functions. Can be easily turned into a separate function once there are multiple users.
They are analyzing models trained on classification tasks. At the end of the day, classification is about (a) engineering features that separate the classes and (b) finding a way to represent the boundary. It's not surprising to me that they would find these models can be described using a small number of dimensions and that they would observe similar structure across classification problems. The number of dimensions needed is basically a function of the number of classes. Embeddings in 1 dimension can linearly separate 2 classes, 2 dimensions can linearly separate 4 classes, 3 dimensions can linearly separate 8 classes, etc.
Maybe two different things here: SBCs that run Linux versus microcontrollers (MCUs).
MCUs are lower power, have less overhead, and can perform hard real-time tasks. Most of what Arduino focuses on are MCUs. The equivalent is the Raspberry Pi Pico.
In my experience, the key thing is the library ecosystem for the C++ runtime environment. There are a large number of Arduino and third-party high-level libraries provided through their package management system that make it really easy to use sensors and other hardware without needing to write intermediate level code that uses SPI or I2C. And it all integrates and works together. The Pico C/C++ SDK is lower level and doesn’t have a good library / package management story, so you have to read vendor data sheets to figure out how to communicate with hardware and then write your own libraries.
It’s much more common for less experienced users to use MicroPython. It has a package management and library ecosystem. But it’s also harder to write anything of any complexity that fits within the small RAM available without calling gc.collect() in every other line.
Yes. One looming concern here is that if the new Arduino is happy locking stuff down, the Arduino IDE story could end up being murkier like the PlatformIO story.
reply