All current use cases, the ones that caught the public eye, just don't have a need for locally run LLMs. Apple has to come up with functionality that can work with on-device LLMs and that is hard to do. There aren't that many use cases for it as the input vectors all map to an app or camera. Even then a full fledged LLM is always better than a quantized, low precision one running locally. Yeah, increased compute is the way, but not a silver bullet as Vision and Audio bound LLMs require large amounts of memory