Uhh.. I hate to be the one to ask this question, but shouldn't we be focused on ...

itishappy · on Oct 9, 2024

Coal (and even wood!) powered cars actually existed long before Ford, but didn't take off because they were too heavy and unwieldly. The Model T was the result of a century of optimization.

https://en.wikipedia.org/wiki/Nicolas-Joseph_Cugnot

lukev · on Oct 9, 2024

Also, making neural networks faster/cheaper is a big part of how they advance.

We've known about neural architectures since the 70s, but we couldn't build them big enough to be actually useful until the advent of the GPU.

Similarly, the LLM breakthrough was because someone decided it was worth spending millions of dollars to train one. Efficiency improvements lower that barrier for all future development (or alternatively, allow us to build even bigger models for the same cost.)

spencerchubb · on Oct 9, 2024

Cheaper compute is basically a prerequisite to making better models. You can get some improvements on the margins by making algorithms better with current hardware, but not an order of magnitude improvement.

When there is an order of magnitude improvement in hardware, the AI labs will figure out an algorithm to best take advantage of it.

Maken · on Oct 9, 2024

The optimizations described could easily work on other models, not just transformers. Following your analogy, this is optimizing plumbing, pistons and valves on steam engines, it could be useful for whatever follows.

fennecfoxy · on Oct 10, 2024

You're also welcome to contribute. There are many people doing many things at once in this space, I don't think experiments like this are a problem at all.

andrewchambers · on Oct 10, 2024

What if working well means making them efficient enough to run more 'neurons' on our current hardware?