Hacker Newsnew | past | comments | ask | show | jobs | submit | skryl's commentslogin

I built mlx-onnx (as part of mlx-ruby), a standalone exporter that converts MLX graphs to ONNX.

  Web Demo: https://skryl.github.io/mlx-ruby/demo

  Repo: https://github.com/skryl/mlx-onnx

  It provides:

  - MLX callable -> ONNX export
  - Python API + native C++ API

  The goal is to make it easier to move MLX models into ONNX tooling (onnxruntime, validation, downstream deployment), while keeping
  export behavior testable and explicit.

  Quick example:

  import mlx.core as mx
  import mlx_onnx as mxonnx

  def f(x):
      return mx.exp(x + 1.0)

  x = mx.array([[1.0, 2.0]], dtype=mx.float32)

  mxonnx.export_onnx("model.onnx", f, x, model_name="demo", opset=18)

If you try it and hit an issue, I’d love a repro.

Performance is competitive with Python MLX. On small models, Ruby is within 0.55-1.54x of Python depending on model type and device. The heavy lifting happens in the same C++ / Metal runtime either way.

Ruby deserves better ML tooling. The language is expressive enough that model definitions can actually be more readable than their Python equivalents. gem install mlx to try it out.


Performance per watt is better than h100 and b200, performance per watt per $ is worse than B200, and it does fp8 just fine

https://arxiv.org/pdf/2503.11698


One caveat is that this paper only covers training, which can be done on a single CS-3 using external memory (swapping weights in and out of SRAM). There is no way that a single CS-3 will hit this record inference performance with external memory so this was likely done with 10-20 CS-3 chips and the full model in SRAM. Definitely can’t compare token/$ with that kind of setup vs a DGX.


Thanks for the correction. They are currently using FP16 for inference according to OpenRouter. I had thought that implied that they could not use FP8 given the pressure that they have to use as little memory as possible from being solely reliant on SRAM. I wonder why they opted to use FP16 instead of FP8.


Performance per watt per dollar is a useless metric as calculated. You can't spend more money on B200s to get more performance per watt.


Trusted (http://usetrusted.com) | San Francisco | Onsite, Fulltime | $100-$150k, 0.5-1.0% equity

Contact: alex@usetrusted.com

Trusted alleviates the pain parents face in discovering, scheduling and paying for high quality, vetted child care.

We are a small team working on transforming the child care industry and helping countless parents in the process. We care deeply about the quality of the service we provide but we also pride ourselves on the wellbeing and happiness of our team. Our day to day usually involves a standup around 10am, a few 10 minute exercise breaks throughout the day, and we normally tie things up between 6pm and 7pm.

We're looking for an experienced front-end engineer to lead client-side Javascript development and grow both our internal and customer facing web clients. Because of the small size of our team, we love engineers who feel comfortable across the whole stack but specialize in something they love!

Skills We Are Looking For:

  * 5+ Years of client-side Javascript development 
  * Deep knowledge of React, Angular, Backbone, or another client-side framework
  * Experience with UI/UX testing
Bonus:

  * Design chops
  * A portfolio which showcases your previous work 
  * A Github account with cool projects in it 
  * Experience with server-side technologies (Ruby, Python, PHP, etc)
  * Mobile development experience


If you're looking for the rest of the private.xml file (HYPER + H/J/K/L) mappings...

https://gist.github.com/skryl/8143550


Thanks for writing this. If anyone else is ever in a similar situation, please do your best to get out of the room. Even if you think your attacker might be hurt and is no longer restraining you, just get out. Get out and THEN call someone. Knock on doors, whatever... if you don't have your phone. Staying put and waiting for the attacker to leave is a BAD idea, even if you get a chance to use a phone.


For the same reason that we're all still on 12 months, 30ish days, 24h, 60m, 60s, 1000ms time.


When you put it like that, it's a bit weird. Is there an imperial fraction of a second that somehow never caught on?


This is because our system of time is based the earth circling the Sun and works best for things to slice up a circle, 360 degrees, base 6.


Works for me in Chrome.


I just didn't get that one is supposed to drop a file from the computer - you might clarify the wording a bit.


It's like calling private methods on a class ;) Not easily accessible but once you figure out how to do it you can accomplish certain feats that may have seemed impossible beforehand. Alas, with great power comes great responsibility. Ever seen Limitless?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: