More

victotronics · on Dec 30, 2023

Immersion cooling is getting big. At the last Supercomputing conference I probably saw at least a dozen vendors of immersion cooling equipment. My datacenter has one cluster with liquid cooling caps over the sockets, and two immersed clusters. The latter two have basins of various degrees of sophistication under them for when they do spring a leak.

victotronics · on Dec 30, 2023

Not quite. Every modern BLAS is (likely) based on Kazushige Goto's implementation, and he was indeed at TACC for a while. But probably the best open source implementation "BLIS" is from UT Austin, but not connected to TACC.

bee_rider · on Dec 30, 2023

Oh really? I thought BLIS was from TACC. Oops, mea culpa.

RhysU · on Dec 30, 2023

https://github.com/flame/blis/

Field et al, recent winners of the James H. Wilkinson Prize for Numerical Software.

Field and Goto both collaborated with Robert van de Geijn. Lots of TACC interaction in that broader team.

victotronics · on Dec 30, 2023

> Lonestar, a host name which pointed to a Cray T3E

Lonestar5 was a Cray again. Currently Lonestar6 is an oil-immersion AMD Milan cluster with A100 GPUs. The times, they never stand still.

victotronics · on Dec 30, 2023

It's saturday afternoon.

  [login1 ~:3] who | cut -d ' ' -f 1 | sort -u | wc -l
  41

victotronics · on Dec 30, 2023

You do a lot of scare quotes. Do you have any suggestions on how things could be different? You need batch jobs because the scheduler has to wait for resources to be available. It's kinda like Tetris in processor/time space. (In fact, that's my personal "proof" that workload scheduling is NP-complete: it's isomorphic to Tetris.)

And what's wrong with shell scripts? It's a lingua franca, generally accepted across scientific disciplines, cluster vendors, workload managers, .... Considering the complexity of some setups (copy data to node-local file systems; run multiple programs, post-process results, ... ) I don't see how you could set up things other than in some scripting language. And then unix shell scripts are not the worst idea.

Debugging failures: yeah. Too many levels where something can go wrong, and it can be a pain to debug. Still, your average cluster processes a few million jobs in its lifetime. If more than a microscopic portion of that would fail, computing centers would need way more personnel than they have.

crabbone · on Dec 31, 2023

> And what's wrong with shell scripts?

When used as configuration? Here are some things that are wrong:

* Configuration forced into a single line makes writing long lines inconvenient (for example, if you want Slurm with Pyxis, and you need to specify the image name -- it will most likely not fit on the screen.

* Oh, and since we mentioning Pyxis -- their image names have pound sign in them, and now you also need to figure out how to escape it, because for some reason if used literally it breaks the comments parser.

* No syntax highlighting (because it's all comments).

* No way to create more complex configuration, i.e. no way to have any types other than strings, no way to have variables, no way to have collections of things.

* No way to reuse configuration (you have to copy it from one job file to another). I honestly don't even know what happens if you try to source a job configuration file from another job configuration.

All in all, it's really hard to imagine a worse configuration format. This sounds like a solution from some sort of a code-golfing competition where the goal was to make it as bad as possible, while still retaining some shreds of functionality.

victotronics · on Jan 26, 2017

No. Please don't. The book still gets updated regularly, so any copy that is not straight from the repository will get quickly out of date. (I first published this book 6 years ago. You can find pdf copies out on the intertubes that are 200 pages shorter than the most recent version.)

Also, if you link straight to the pdf you don't get to see links to my other books.

Or links to places where you can get a paper copy. Which actually earns me a couple of pennies.

So please: don't make your own link to the pdf file. Don't.

agumonkey · on Jan 26, 2017

Apologies ..

dang if you see this, could you delete or edit my above comment ? Thanks

victotronics · on Jan 24, 2017

Basics of what? High performance computing? I'd say those are numerical analysis topics and there are tons of books for that. Unless you can make a case that there are high performance aspects to root finding, I'm not going to include it. (You should have said FFT. That has very funky interaction with caches and TBL that absolutely necessitate its inclusion.)

victotronics · on Jan 24, 2017

Section 2.9.3

victotronics · on Jan 24, 2017

Contact me if you want to to discuss the outline of a short section with me. My reason for not adding the hyperbolic case was that it didn't seem to add much computationally to the discussion.

victotronics · on Jan 24, 2017

1. Do you know how much time it takes to keep a 600 page book up to the minute? 2. But yeah. I'm going to roll the tutorials into a volume of their own.