When have you last used it. I used a few weeks ago to create a fake podcast as a side project recently and it sounded pretty good with their highest end model with cranked up tunings.
My point isn’t necessarily elevenlabs being good or bad, it’s the difference between its text to voice and voice to voice generations. The latter is incredibly expressive and just shows how much is lacking in our ability to encode inflection in text.