A lot of the voice quality problems can be attributed to forcing cellphones to route through the PSTN[1] which is 8bit u-law 8kHz audio. That's a huge stretch from CD-quality audio--it used to be presumed the minimum required for voice to be understandable, and besides the u-law encoding, it's uncompressed. You would get much better audio quality by recording voice at a higher sampling rate and encoding it as a 64kbps MP3, which is the same bitrate as the PSTN and trivial for cellphone hardware to do; however, cellphones have to be able to call landlines and other cellphones, and the common point of connectivity (and so the lowest common denominator) is the PSTN. VoIP can use even better audio compression algorithms that result in audio quality far superior to anything you'd hear through the PSTN. For instance, this is why many reviewers of FaceTime report that activating it makes the audio quality of their call noticeably better, on top of all the niceties of the video stream.
It's not just humans that have trouble hearing at 8kHz 8-bit u-law--voice transcription software is remarkably more accurate when using a microphone on a computer recording at 44kHz 16-bit compared to over the phone. This is part of the reason, for instance, Google Voice transcription is nowhere near as effective as Dragon Naturally Speaking on your laptop, and never will be.
It's not just the μ-law backend, though; it's also the codecs used on the cellphone network. For example, 3G GSM supports AMR-WB, which is apparently fairly good quality, but the carriers don't have to enable it. Some European carriers use it; but they have actual competition. I'm pretty sure AT&T, for one, limits their network to AMR, at half the bitrate.
FaceTime can do better because Apple is bypassing the carrier network altogether.
It's not just humans that have trouble hearing at 8kHz 8-bit u-law--voice transcription software is remarkably more accurate when using a microphone on a computer recording at 44kHz 16-bit compared to over the phone. This is part of the reason, for instance, Google Voice transcription is nowhere near as effective as Dragon Naturally Speaking on your laptop, and never will be.
[1] http://en.wikipedia.org/wiki/Public_switched_telephone_netwo...