LLMs are heavily subsidised. If you self-host them and run them at cost, then you find that the GPU costs are high, and that's largely without the additional tools that OpenAI and Anthropic provide and which also must cost a lot to operate.
Before I started self-hosting my LLMs with Ollama, I imagined that they required a ton of energy to operate. I was amazed at how quickly my local LLM operates with a relatively inexpensive GeForce RTX 4060 with 8GB VRAM and an 8b model. The 8b model isn't as smart as the hosted 70b models I've used, but it's still surprisingly useful.