I think everybody that has to write applications that deal with URLs as core ide...

I think everybody that has to write applications that deal with URLs as core identifiers have asked this. It's also hairy 'cause leading part of a URL (protocol and hostname) is case insensitive but the trailing part (path, query string and fragment) isn't. At my last job I created a framework for normalizing and canonicalizing URLs as well as storing them consistently (with the hostname components reversed), it was a big improvement for retrieval and duplication detection accuracy and performance.