128gb is the max RAM that the current Strix Halo supports with ~250GB/s of bandwidth. The Mac Studio is 256GB max and ~900GB/s of memory bandwidth. They are in different categories of performance, even price-per-dollar is worse. (~$2700 for Framework Desktop vs $7500 for Mac Studio M3 Ultra)
I would honestly guess that this is just a small amount of tweaking on top of the Sonnet 4.x models. It seems like providers are rarely training new 'base' models anymore. We're at a point where the gains are more from modifying the model's architecture and doing a "post" training refinement. That's what we've been seeing for the past 12-18 months, iirc.
> Claude Sonnet 4.6 was trained on a proprietary mix of publicly available information from
the internet up to May 2025, non-public data from third parties, data provided by
data-labeling services and paid contractors, data from Claude users who have opted in to
have their data used for training, and data generated internally at Anthropic. Throughout
the training process we used several data cleaning and filtering methods including
deduplication and classification. ... After the pretraining process, Claude Sonnet 4.6 underwent substantial post-training and fine-tuning, with the intention of making it a helpful, honest, and harmless1 assistant.
Does anybody know when Codex is going to roll out subagent support? That has been an absolute game changer in Claude Code. It lets me run with a single session for so much longer and chip away at much more complex tasks. This was my biggest pain point when I used Codex last week.
I've been working on decompiling Dance Central 3 with AI and it's been insane. It's an Xbox 360 game that leverages the Kinect to track your body as your dance. It's a great game, but even with an emulator, it's still dependent on the Kinect hardware which is proprietary and has limited supply.
Fortunately, a Debug build of this game was found on a dev unit (somehow), and that build does _not_ have crazy optimizations in place (Link-time Optimization) that make this feat impossible.
I am not somebody that is deep on low level assembly, but I love this game (and Rock Band 3 which uses the same engine), and I was curious to see how far I could get by building AI tools to help with this. A project of this magnitude is ... a gargantuan task. Maybe 50k hours of human effort? Could be 100k? Hard to say.
Anyway, I've been able to make significant progress by building tools for Claude Code to use and just letting Haiku rip. Honestly, it blows me away. Here is an example that is 100% decompiled now (they compile to the exact same code as in the binary the devs shipped).
My branch has added over 1k functions now and worked on them[0]. Some is slop, but I wrote a skill that's been able to get the code quite decent with another pass. I even implemented vmx128 (custom 360-specific CPU instructions) into Ghidra and m2c to allow it to decompile more code. Blows my mind that this is possible with just hours of effort now!
I spent way too many hours writing this all today, but I wanted to get this pushed out for others to learn from. There is a ton of detail in this notes file[0] that Claude Code helped me assemble.
If anybody has any suggestions or questions, shoot! It's 4am though so I'll be back in a bit. These CVEs are quite brutal.
Unfortunately not. It's still very broken, and next year it will be worse for a ton of people. I got AI to write a short answer for you:
> Short version: Obamacare never turned into “free primary care for everyone,” it was just a bunch of rules and subsidies bolted onto the same old private-insurance maze. It helped at the margins (more people covered, protections for pre-existing conditions), but premiums/deductibles can still go nuclear if you’re in the wrong income bracket, state, or employer situation. From an EU/Poland perspective it’s not a public health system at all, just a slightly nerfed market where you still get to roll the dice every year.
There is also a tradeoff between different vocabulary sizes (how many entries exist in the token -> embedding lookup table) that inform the current shape of tokenizers and LLMs. (Below is my semi-armchair stance, but you can read more in depth here[0][1].)
If you tokenized at the character level ('a' -> embedding) then your vocabulary size would be small, but you'd have more tokens required to represent most content. (And context scales non-linearly, iirc, like n^3) This would also be a bit more 'fuzzy' in terms of teaching the LLM to understand what a specific token should 'mean'. The letter 'a' appears in a _lot_ of different words, and it's more ambiguous for the LLM.
On the flip side: What if you had one entry in the tokenizer's vocabulary for each word that existed? Well, it'd be far more than the ~100k entries used by popular LLMs, and that has some computational tradeoffs like when you calculate the probability of each 'next' token via softmax, you'd have to run that for each token, as well as increasing the size of certain layers within the LLM (more memory + compute required for each token, basically).
Additionally, you run into a new problem: 'Rare Tokens'. Basically, if you have infinite tokens, you'll run into specific tokens that only appear a handful of times in the training data and the model is never able to fully imbue the tokens with enough meaning for them to _help_ the model during inference. (A specific example being somebody's username on the internet.)
Fun fact: These rare tokens, often called 'Glitch Tokens'[2], have been used for all sorts of shenanigans[3] as humans learn to break these models. (This is my interest in this as somebody who works in AI security)
As LLMs have improved, models have pushed towards the largest vocabulary they can get away with without hurting performance. This is about where my knowledge on the subject ends, but there have been many analyses done to try to compute the optimal vocabulary size. (See the links below)
One area that I have been spending a lot of time thinking about is what Tokenization looks like if we start trying to represent 'higher order' concepts without using human vocabulary for them. One example being: Tokenizing on LLVM bytecode (to represent code more 'densely' than UTF-8) or directly against the final layers of state in a small LLM (trying to use a small LLM to 'grok' the meaning and hoist it into a more dense, almost compressed latent space that the large LLM can understand).
It would be cool if Claude Code, when it's talking to the big, non-local model, was able to make an MCP call to a model running on your laptop to say 'hey, go through all of the code and give me the general vibe of each file, then append those tokens to the conversation'. It'd be a lot fewer tokens than just directly uploading all of the code, and it _feels_ like it would be better than uploading chunks of code based on regex like it does today...
This immediately makes the model's inner state (even more) opaque to outside analysis though. e.g., like why using gRPC as the protocol for your JavaScript front-end sucks: Humans can't debug it anymore without other tooling. JSON is verbose as hell, but it's simple and I can debug my REST API with just network inspector. I don't need access to the underlying Protobuf files to understand what each byte means in my gRPC messages. That's a nice property to have when reviewing my ChatGPT logs too :P
> One area that I have been spending a lot of time thinking about is what Tokenization looks like if we start trying to represent 'higher order' concepts without using human vocabulary for them. One example being: Tokenizing on LLVM bytecode (to represent code more 'densely' than UTF-8)
I've had similar ideas in the past. High level languages that humans write are designed for humans. What does an "LLM native" programming language look like? And, to your point about protobufs vs JSON, how does a human debug it when the LLM gets stuck?
> It would be cool if Claude Code, when it's talking to the big, non-local model, was able to make an MCP call to a model running on your laptop to say 'hey, go through all of the code and give me the general vibe of each file, then append those tokens to the conversation'. It'd be a lot fewer tokens than just directly uploading all of the code, and it _feels_ like it would be better than uploading chunks of code based on regex like it does today...
That's basically the strategy for Claude's new "Skills" feature, just in a more dynamic/AI driven way. Claude will do semantic search through YAML frontmatter to determine what skill might be useful in a given context, then load that entire skill file into context to execute it. Your idea here is similar, use a small local model to summarize each file (basically dynamically generate that YAML front matter), feed those into the larger model's context, and then it can choose which file(s) it cares about based on that.
reply