Uhh.. I hate to be the one to ask this question, but shouldn't we be focused on making LLMs work well first and then focused on desired optimizations? Using everyone's car analogy, it is like making sure early cars are using lower amount of coal. It is a fool's errand.
Coal (and even wood!) powered cars actually existed long before Ford, but didn't take off because they were too heavy and unwieldly. The Model T was the result of a century of optimization.
Also, making neural networks faster/cheaper is a big part of how they advance.
We've known about neural architectures since the 70s, but we couldn't build them big enough to be actually useful until the advent of the GPU.
Similarly, the LLM breakthrough was because someone decided it was worth spending millions of dollars to train one. Efficiency improvements lower that barrier for all future development (or alternatively, allow us to build even bigger models for the same cost.)
Cheaper compute is basically a prerequisite to making better models. You can get some improvements on the margins by making algorithms better with current hardware, but not an order of magnitude improvement.
When there is an order of magnitude improvement in hardware, the AI labs will figure out an algorithm to best take advantage of it.
The optimizations described could easily work on other models, not just transformers. Following your analogy, this is optimizing plumbing, pistons and valves on steam engines, it could be useful for whatever follows.
You're also welcome to contribute. There are many people doing many things at once in this space, I don't think experiments like this are a problem at all.