Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Short History of Romaji (dampfkraft.com)
59 points by polm23 on Nov 16, 2020 | hide | past | favorite | 45 comments


Not a bad summary of a very complex topic, but one key takeaway is missing: for all practical purposes, despite not being a formally anointed standard, Hepburn "won" the romanization war. It's used in Japanese passports, road signs, railway signs, all major newspapers, JNTO (the tourism org), most textbooks, etc etc.

In second place, particularly in informal contexts, is wapuro (word processor/computer/mobile phone) romaji. "jya" for じゃ is neither Hepburn, Kunreishiki nor Nipponshiki, but it's probably how the average Japanese would write the romaji for that if asked to.

Legit Kunrei/Nipponshiki, despite its notional official status, is dead: nobody writes Japan's tallest mountain as "Huzi".


That's correct. But we do see some mix of the two occasionally.

Also Kunreishiki looks really weird until you look at a hiragana table. The Hepburn is closer how the word would be pronunced if it was a European language (English/Spanish/French/etc) but Kunrei is more "logical" when coming from hiragana.


Nippon shiki has lost utility with a media-influenced nationwide linguistic shift towards Tokyo Japanese. ジ and ぢ had seperate pronounciations at one point: see ビルヂング vs modern ビルディング


Still do, in some parts of the country: https://en.wikipedia.org/wiki/Yotsugana


Another complication is that some romanisation schemes are ambiguous with regard to long vowels. 「とお」 and 「とう」 might both be represented as "tō", "too", "toh", or (god help us all) "to". It's also somewhat common for signs to be written in kanji and ambiguous romaji, which changes a quick Google Maps lookup into a dictionary adventure.

A direct mapping from kana to romaji produces "tou", which is not how it's pronounced[0], but does make it much easier to type in.

The classic example is 「東京」/「とうきょう」, pronounced "Toukyou", romanized as "Tokyo".

[0] I am currently studying Japanese for "it would be nice to be able to read my bank statements" purposes, and the teacher has spent a great deal of time emphasizing that pairs like 「とう」 and 「とお」 have the same pronunciation. But try to convince someone that the name John ought to be written 「ジャン」 and you'll get the blankest stares.


> But try to convince someone that the name John ought to be written 「ジャン」 and you'll get the blankest stares.

That sounds like an American pronunciation variant with the 'o' shifting towards 'a'.

Common given names tend to have a fixed romanization though. See ENAMDICT for a bunch of Johns:

http://nihongo.monash.edu/cgi-bin/wwwjdic?1C

You can vary your romanization of your own name to what actually represents it most closely, but with a very common name like John it would be rather awkward.


The correct way to write Tokyo in Hepburn romaji is actually Tōkyō (to indicate the longer "o"). However it's rarely romanized this way.

At some point we have to admit that especially for names (of places, people) we don't need to have a "1 to 1" conversion between hiragana and romaji. People pick the name that work well in English or other alphabet-based languages rather than a direct conversion.

People who can't speak Japanese are not going to pronounce names correctly anyway.


This is actually a pet peeve of mine. You can find countless examples of people in the USA going out of their way to speak an Italian, French, or Spanish word with the correct Italian, French, or Spanish pronunciation. SNL even did a skit about it.

https://www.nbc.com/saturday-night-live/video/enchilada/n997...

I know that skit is a joke but it came about because of events related Latin America being in the news and all the newscasters going out of their way to use the correct pronunciation.

It's got absolutely nothing to with saying a word how it looks in English because there are plenty of European origin words that are pronounced far different than they would following English rules. "Pizza" being a simple example.

I still hear people doing in regularly in the news, bars, restaurants. But I have never in my life seen any American ever give the same effort to Asian languages. Simple easy examples, there no "ee" sound in "Tokyo" or "Kyoto".


They're actually all ambiguous, because so is Japanese writing. The おう of 王 is a long O, the おう of 追う is two separate vowels (O+U).


I think you meant it's pronounced "Tookyoo" (but entered as "Toukyou" on a keyboard) :-)

As for Katakana versions of English words, honestly, they're kind of all over the place. You eventually develop a decent instinct for how it works, but I think some details really just have no consistency.


I don't know if it's my imagination, but the leading mora of 東京 sounds much closer to 藤 than it does to 遠い or 十日. It's not a stress difference, more like there's sort of an ... umlaut ... in there? At least, I've had much more than 50/50 luck guessing the difference when looking up unfamiliar vocabulary from TV or YouTube.

Oddly, Japanese as a second language has a completely different curriculum (including idiosyncratic labels for grammatical constructs) so for all I know this is yet another well-known linguistic quirk taught to third graders but excluded from JSL textbooks.


I think you might be right that there is some nonzero difference in the pronunciation of words containing (o)う vs (o)お sometimes, but it's not pronounced as a proper "ou" most of the time; if you need to pick one, you'd go with straight "oo".

I just went through a bunch of examples with a Japanese friend and we can't really find a consistent difference between おう and おお, but it seems we agree that trying to pronounce either too far into "ou" land sounds wrong... at least some of the time... It's kind of tricky :-)


Some thoughts—Hepburn is a ridiculous system and it only makes sense to English speakers. Like—why is the /ʧ/ sound romanized as “ch”? It’s absurd—the only justification for it is that English does it the same way. The only reason to use Hepburn is if you have no interest in learning Japanese and just want to be able to read a couple words out of a phrasebook, or read the name of a person or place.

The JSL romanization, which is mentioned only offhand, is actually pretty interesting if you are interested in learning pronunciation. For example, nihôn means “Japan” and nîhon means “two long cylindrical things”. They are pronounced differently, so why not write them differently? Other romanizations write them both as “nihon” which is ambiguous.


If you're serious about wanting to learn Japanese, the first thing you need to do is remove the training wheels and start using kana. Romaji is meant squarely at people who can't and don't intend to read Japanese, and for them "chi" is much closer to /ʧi/ than Kunrei's "ti", even though it violates the symmetry of the た行 and thus obscures some verb conjugations the casual reader won't care about.

As for pitch accents, Japanese is only very weakly tonal and what tone there exists is regional to boot, so nobody is going to get confused if you order nihon of beer with the wrong tone. (This is manifestly not the case for, say, Chinese.)


Japanese is not weakly tonal, that is a giant myth. While the accent patterns have regional variations, if you intend to actually communicate in Japanese beyond ordering stuff at restaurants, you are going to sound very awkward if all your pitch accent patterns are wrong. There are many homophones only differentiated by pitch accent too, which can get really confusing.

Not teaching pitch accent is one of the greatest failings of most Japanese courses and textbooks.


Sure, it's a thing, but the average adult learner of Japanese, especially if monolingual English speaking, will have far greater trouble with vowels (short vs long), the "r" sound, glottal stops (especially ん), etc. By the time you've mastered all this, you'll also pick up the basics of pitch just from exposure.

Japanese also abounds in perfect homonyms and context is almost always enough to distinguish "captain" from "enema".


Hepburn is not for foreigners learning Japanese. If you're learning Japanese you should start with hiragana and never use romaji in your learning.

Hepburn is for international use of Japanese names, for visitors (who might not learn Japanese if they're just doing a 2 weeks trip) and Japanese people abroad.

If you were Japanese in US (or Germany, or Russia), would you rather be Mr. Fujiwara from Kyushu or Mr. Huziwara from Kyusyu? What's going to get people to prononce your name as close as possible to your real name?


Obviously you would need a different Romaji for Germany... and something even more different for Russia, which uses cyrillic!


Speaking as someone who does in fact speak, read, and write Japanese, I would like to preface this by saying that I greatly enjoy and continue to use the Hepburn system.

I think there can be two purposes to a romanization system for any given language, and more generally, any system used to transliterate from one language to another.

The first, which you're getting at, is accurate representation of the original phonetics to as precise a degree as possible. Obviously this has value in situations such as the one you point out, where two similar items have subtly different pronunciations.

The second is to be useful to a non-native speaker with possibly not even a passing familiarity with the language. I don't know about you, but I don't know what /ʧ/ is, and I surely don't expect my friends, or your run-of-the-mill travel to Japan to know it either. I'm also not a fan of accented letters like 'ô', which in a romanization, inches closer to IPA territory. Surely it's my American perspective speaking, but accented letters are, to me, devoid of information and I can't help but interpret them as noise rather than signal.

I argue that if precision is your most desired quality, then you may as well go whole-hog and just adopt IPA for your rominization needs. Linguists have figured out how to represent sounds with precision, so we may as well piggyback on their hard work.

If however, you think that a romanization should be useful rather than precise, then Hepburn makes quite a bit of sense, barring it's occasional quirks like 'wo' for 'を'. The system has traded precision for ease of use. Was some information lost in the process? Yes, absolutely. Intonation is gone, for instance. But although intonation exists in the language. it just isn't that important in Japanese.

Textbooks for other languages have gone with more precise romanization system and suffered for it. I've gone through some Korean textbooks, and they generally all use a weird combination of IPA and some uncommon (to us non-linguists) ways of romanizing hangeul, which made the initial few weeks of learning depressingly difficult. There was a clear overfocus on precision of pronunciation, but as an introductory level learner, I just didn't care for that level of accuracy. A system that is easy to understand and pronounce at first sight at the cost of some phonetic accuracy is well worth it for the casual observer. When extremely high accuracy of pronunciation becomes important to you, IPA is always around to show you how it's really supposed to sound.


> I'm also not a fan of accented letters like 'ô', which in a romanization, inches closer to IPA territory.

Japanese has ten vowels ā, ē, ī, ō, ū, a, e, i, o, and u. If you delete the macron over the vowels, you just lost half of the vowels of Japanese. It would be like writing English without the silent e (e.g. mat, mate would both be spelt "mat") because silent e is "devoid of information" and "noise rather than signal."


> But although intonation exists in the language. it just isn't that important in Japanese.

I was with you until this part. While it is certainly true that intonation (or rather pitch accent) is not written in Japanese, it is absolutely an important part of the language. If your pitch accent is wrong, people will have a harder time understanding you, or may not understand you at all, especially if you make other mistakes on top of it or are using a word that means different things depending on the pitch accent.

The never-ending myth that "Japanese is flat" or that pitch accent does not matter in Japanese is one of the primary causes of Japanese learners ending up with poor prounciation and difficulty sounding natural. It's like having all the accents wrong in Spanish (my native language); sure, people will eventually understand you, but it's just awkward and not correct. Except it's worse in Japanese due to the much higher occurrence of homophones only differentiated by pitch accent.


I've started learning Japanese but moved to Chinese (Taiwanese Mandarin). It's a similar situation:

There is the official Hanyu pinyin romanization which is "technically correct", but nobody knows how to pronounce "qiu" or how to type "lǚ". And there are other romanization schemes which aim for ease of pronunciation by people who don't know any Chinese.

It's more complicated than that and somewhat politicized, but in the end, different romanization systems make sense in different contexts.


The new AZERTY keyboard might allow to type "lǚ". I'll have to check.

EDIT : yep, looks the same at least… : ǚ

http://norme-azerty.fr/


> Surely it's my American perspective speaking, but accented letters are, to me, devoid of information and I can't help but interpret them as noise rather than signal.

As a francophone the difference between "e" and "é" _cannot_ be ignored - and their pronunciation is also different. Likewise, accents are signal in Spanish.

However, when reading romanized Japanese I also dislike the use of accents, mostly from a lack of norm indicating which accent corresponds to which pronunciation.


> why is the /ʧ/ sound romanized as “ch”?

There is no [tʃ] in Japanese, but English speakers approximate [tɕ] as [tʃ], so writing it as ⟨ch⟩ allows English speakers without knowledge of Japanese to do their best attempt at pronouncing e.g. someone's name. Given the dominance of English in international communication, it's no surprise Hepburn won. "Makes sense to English speakers" is a feature, not a bug.


To me it was a bug, because my language(Polish) has both tɕ(represented as "ć"), and the English approximation ("cz").

It took a book written by a Polish person who used to live in Japan to find out how "sushi" is actually pronounced.


Tell me more


The choice of `ch` is clearly inspired by English, but the other common alternatives (Nihonshiki, Kunreishiki) are not self-consistent in that they paper over differences in pronunciation. To be consistent you need to represent ち as something other than `ti`, so then why not `chi`?

At this point anything other than Hepburn and Nihonshiki/Kunreishiki is a non-starter due to lack of familiarity, so I would be happy if everyone would just get on board with Hepburn.


> To be consistent you need to represent ち as something other than `ti`

Why? What else would you be representing as 'ti'? たちつてと is the related line (e.g. verbs ending in ち become て when you put them in potential form) so romanising them as ta/ti/tu/te/to makes sense.


It makes sense if you're aiming for a faithful transliteration of the original syllables, which is what Nihonshiki does. However, if you want a phonemic transcription that can be read by non-native speakers, then it makes sense to do what Hepburn does and accurately represent the sounds (not the writing) of the language, where the consonant in ち is not the same as the one in た.


> if you want a phonemic transcription that can be read by non-native speakers, then it makes sense to do what Hepburn does and accurately represent the sounds (not the writing) of the language

When you say "non-native speakers" you mean "monoglot English speakers". Hepburn represents the sounds in a way that corresponds to English peculiarities; it's easier to read for English speakers and English speakers only (speakers of non-English languages with Latin orthographies find Nihonshiki easier). It makes sense for teaching materials aimed specifically at an English-speaking audience, but not on e.g. signs aimed at international tourists.


In what non-English language is "Cyasyu" easier to read than "Chashu"?

FWIW, I'm a native speaker of a non-English language with a Latin orthography, and while the language doesn't use the English digraphs "ch" or "sh", those are still widely understood and found in loanwords etc.


> In what non-English language is "Cyasyu" easier to read than "Chashu"?

I was going by my experience of Irish and German, where "ch" has a specific pronunciation that is quite different from that Japanese sound. Having found https://commons.wikimedia.org/wiki/File:Pronunciation_of_CH_... I guess I was generalising too much myself :).



Using the Latin alphabet in no way means "buying into the idea that each letter has a particular pronunciation." This is very obvious if you speak Irish, but even in English there are phonemes that are written as digraphs (most obviously th but also ch and sh - which is why Hepburn seems non-awful to people whose native language is English) - no Latin orthography that's in live use as a spoken language is, or could be, "consistent" in your sense.

The 50音表 regularity is real and meaningful (e.g. it shows up in verb inflections); any representation that lacks it is necessarily less faithful to the native experience. Whereas the letter-sound correspondence in Hepburn is not consistent in the first place ("chi" for ち is ridiculous to anyone who speaks a non-English language), so the idea that Nihonshiki is somehow less consistent is simply wrong.


Paul’s dampkraft blog has been on the front page of HN quite a lot lately and I’ve enjoyed every post a lot.

I’m wondering if this has helped him generate new leads or clients.

Polm23, if you are reading, do you have any comments on business impact you are willing to share publicly?


Glad you enjoy the articles. I recognize your handle from other posts here, have we met somewhere else?

I've had clients mention seeing my website or seeing my name in articles elsewhere, but I don't remember any mentioning HN. I've also gotten no reaction the few times I've posted on the hirings posts, even with an excellent endorsement one time. On the other hand people do find my site through search engines and I assume HN links help boost rankings there.


> ローマ字ひろめ会 (Romaji Hirome Kai […])

Rōmaji, not romaji¹. The macron (long dash above) over the vowel indicates that the is a long vowel sound (roughly twice the length of, for example, the ma or ji following it). The purpose of romanization is to get text that can be pronounced roughly correct by someone not fluent in reading Japanese kanji and kana.

Dampfkraft is knowledgeable on these topics, which makes the complete absence of macrons in his romanized words doubly conspicuous.

1: Unless you are using it as a loanword in English, but here it was a literal romanized transcription.


This is fascinating, I've done few tools around Japanese but never so deep as this. I find this conversion to the full foreign word a bit odd though:

カツ丼。豚カツ。カツカレー。カツ。

Katsudon. Tonkatsu. Cutlet curry. Cutlet.

The library doesn't seem too complex (except for the many unknown imports), I might try to port it to JS at some point


Similar critique was heavily featured during Polm's announcement of cutlet on HN.

I never quite understood the reasoning either: https://news.ycombinator.com/item?id=23800805


It's natural though. I could probably say romanization is a casual, reasonable effort method to convey a word so that the both sides can agree on the word in the source language with reasonable accuracy.

Uncommon/novel/excessive romanizations like "katsukaree" or "hakkar nywusu" can be technically correct or more precise, but if they can't be traced to the word or concept at less effort for an average audience than "cutlet curry" or "hacker news", they actually serve the purpose less.

For real world uses, Google Search[1] in Japanese for "cutlet curry" gives me an info panel for cutlet, a dictionary page as top result, as well as multiple Japanese videos about cutlet curry(why would anyone take and upload that kinds of videos is a mystery to me). In contrast, for "katsukaree", it gives me "maybe you meant..." mini panel, and the top three results are an Instagram tag page, English Wikimedia Commons link, and Tripadvisor page for a restaurant in Columbia. I checked the tag results page but none of top or newest posts seemed to be someone from the country. I think it's safe to say "katsukaree" is quite an unpopular expression to native Japanese speakers.

[1]: https://imgur.com/jdPZN7m


As a non-native English speaker who lives in Japan and speaks some Japanese (and I'd guesstimate that the majority of foreigners in Japan who speak Japanese aren't fluent in English!) I know what "katsu" is but I would not know for the life of me what "cutlet" means. I even just typed "cutley" and had to look up to correct it! So it's indeed a lot more helpful for me to maintain "katsu" than "cutlet".


I don't think anything is gained by saying "Karee", but "katsu curry" will give you better results; it's pretty much a recognised word in English now.


cutlet pulls in MeCab, a medium sized C++ library, and at minimum a 250MB dictionary. I think people have run MeCab through emscripten before; you could stick it in an electron app but I wouldn't put it in a browser. More power to you if you can get it working but I think it'd be a big project.


Okay it seems you did the heavy search for me, thanks! Now that I think of it this is not a katakana/hiragana => romaji, this is a full kanji/kana => romaji, so yeah it should not be trivial/small at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: