Hacker Newsnew | past | comments | ask | show | jobs | submit | 414owen's commentslogin

Wow, okay. I would imagine this makes mathematicians quite angry? I guess you're responsible for all the operations you use in your proof being well-behaved.

It sounds like subtraction over Nats needs to be split into `sub?`, and `sub!`, the former returning an option, and the latter crashing, on underflow, as is the Lean convention?

To use the default `sub`, you should need to provide a witness that the minuend is >= the subtrahend...

The version with silent underflow is still useful, it should just be called `saturatingSub`, or something, so that mathematicians using it know what they're getting themselves into...


Word.


Libre Office


re2c works like a charm. I'm generally confident that I couldn't write faster scanners by hand if I tried.


This code was written a while ago, but I decided it merited a write-up recently.


I have DietPi running on a Raspberry pi Zero W (armv6l), which is a system that's supported by barely any distros nowadays.

It was easy to preseed, has the absolute bare minimum amount of processes running, and is altogether a fantastic distro. Highly recommended.


For what it's worth, plain Debian runs fine on those as well, most easily via https://raspi.debian.net/tested-images/


This is pretty much `assembly language the game`: https://tomorrowcorporation.com/humanresourcemachine

It's not a useful architecture, but it teaches the thought process really well, and you end up discovering a lot of optimization naturally.

For this article, I'm measuring every step to see what the performance implications of the changes are, which, along with some educated guesses and some googling/reading other articles, was enough for me to figure out what was going on.

In part two (https://owen.cafe/posts/the-same-speed-as-c/) especially, I didn't know what was going on with the benchmarks for a long time. Eventually I got lucky and made a change, which led to a hypothesis, which lead to more tests, which led to a conclusion.


The array lookup approach taken in part two:

https://owen.cafe/posts/the-same-speed-as-c/

But taking the length of the string as a parameter is not, because that changes the problem statement (making the solution vectorizable)

Also note that you'll try to read element -1 of the input. You probably want to change the `>=` to a `>`


Kind words, much appreciated!


So I actually did try that, but and IIRC it didn't produce a CMOV with either gcc or clang. I didn't put it in the repo because it wasn't an improvement (on my machine) and I decided not to write about it.

Maybe you get different results though?


Very interesting approach. I should probably have specified that the somewhat naive assembly in `02-the-same-speed-as-c/loop-5.x64.s` is the fastest version I have.

On my machine I'm getting 0.244s for `loop-5.x64.s` and 0.422s for your implementation above.

I'm not sure why exactly we're seeing this discrepancy, and for what it's worth your implementation looks faster to me. I guess this is why you need to always benchmark on the hardware you're going to be running the code on...


I rerun the benchmark vs loop-5 and loop-7 from the second post. Runtime is basically the same on my machine.

I would have expected yours to be faster given that it needs to execute fewer instructions per loop iteration. Though maybe the CPU can run `adc` on more ports compared to a load from memory?

    Summary
      '01-six-times-faster-than-c/bench-x64-8 1000 1' ran
        1.00 ± 0.00 times faster than '02-the-same-speed-as-c/bench-x64-7 1000 1'
        1.66 ± 0.00 times faster than '01-six-times-faster-than-c/bench-x64-7 1000 1'

    Summary
      '01-six-times-faster-than-c/bench-x64-8 1000 1' ran
        1.01 ± 0.00 times faster than '02-the-same-speed-as-c/bench-x64-5 1000 1'
        1.66 ± 0.00 times faster than '01-six-times-faster-than-c/bench-x64-7 1000 1'


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: