Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Imagine if DNS consisted only of Arabic characters. Imagine how grumpy you'd be that you couldn't go to newyork.gov.us for your local government website, and instead had to go to نيويورك instead?

Assuming you can't read Arabic, how on earth would you even recognise that address, let alone remember how to type it.



Well, I'd probably had to learn something like this: https://en.wikipedia.org/wiki/Arabic_chat_alphabet

I always assumed the non-latin-alphabet people know their way around this because they already used the internet before unicode-domains.


> I always assumed the non-latin-alphabet people know their way around this because they already used the internet before unicode-domains.

But that's not a very good reason to continue it (besides which, there are new people coming to the Internet every day).


That (Arabizi and other romanizations) are in wide use now specifically because the latin alphabet was the de facto standard in the computer world. In hypotheical above, that is not likely the case, and you would have to use actual Arabic script.


In Japan there's a trend to use phone numbers as domain names, probably for this very reason.


Even then those numbers are western. I guess it's easy enough to learn - I'm useless at languages but learnt to recognise arabic and urdu numbers. At least almost every culture in the world has a base 10 system for encoding numbers -- learning Gujarati numbers in addition to your native number system is trivial.


Interesting. I agree that latin isn't sufficient but they could have just extended the list of allowed characters by a few other defined alphabets. No need to include every(?) unicode-point just because it's the technical more elegant solution.


Allowing only a few alphabets doesn't solve much (Cyrillic "а" still looks like latin "a" in most fonts) but opens new cans of worms: how do you update this list, why isn't alphabet x of important minority y included, etc.

Much easier to just allow everything in the technical standards and let domain registries set reasonable standards that make sense for that tld. Sometimes that works well (.de for example has a list of 93 allowed unicode characters [1] which covers everything a German might plausibly use from German or neighboring country's alphabets), but some registries just don't seem to care much (e.g. .com).

1: https://www.denic.de/en/know-how/idn-domains/idn-character-l...


DNS was already 8-bit clean. IDNA was chosen for a complex set of compatibility, management, and user interface reasons.


I have a few years of rusty undergrad Arabic fading in my past, and my take is that non-native speakers would struggle with an ad hoc transliteration system like this (imagine a non-native English-speaker trying to read 133tspeak...)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: