Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Kanji Club: Search Kanji by Parts with Instant Feedback (dampfkraft.com)
97 points by polm23 on March 29, 2021 | hide | past | favorite | 47 comments


Probably 80% of Japanese students are more interested in making tools to study Japanese than actually studying Japanese. It’s a kind of yak shaving. I know this applies to me. (Which is why my Nihongo is crap.) I’m not sure if this happens with other languages, East Asian or not.


Not totally surprising. It's a natural human impulse (maybe?) to want to build or improve your tools when working on something, especially when it feels like your tools are inadequate. And while there are good resources for learning Japanese, they're few and far between in my experience. And nothing, so far as I know, is completely comprehensive on its own.

Personally, I've never gone ahead and made my own Japanese learning app or anything like that, but while studying I do pretty regularly think, "man, this would be so much better if it did X; I wish I had an app to do Y", etc., etc.

Because of Western attitudes towards Japanese pop culture, a lot of people learning Japanese are...well, they're gonna be kinda nerdy, frankly. And that means they're more likely than average to have the sort of skills that allow them to build (software) tools to help them study.

Throw in that learning Japanese, at least for Westerners, is a monumental task mostly driven by intellectual curiosity (rather than for, say, economic opportunities), it'll attract the sort of ambitious and intellectually curious people that'll actually endeavour to make their own tools, with the necessary skills to do it.


> Because of Western attitudes towards Japanese pop culture, a lot of people learning Japanese are...well, they're gonna be kinda nerdy, frankly. And that means they're more likely than average to have the sort of skills that allow them to build (software) tools to help them study.

Whoa, you nailed it here. Never thought of it this way.


> while studying I do pretty regularly think, "man, this would be so much better if it did X; I wish I had an app to do Y", etc., etc.

This is more or less exactly why, despite having played with a lot of different tools, I always come back to Anki. The UI is irritatingly unpolished, and it's difficult to learn well, but it's also the only tool out there that's open and flexible enough that I can easily experiment with and tweak what I'm doing without disrupting my overall learning process overmuch.

I've certainly nerdsniped myself a few times, especially when I'm trying to build something from the ground up because I'm currently toying with some other app. With Anki, though, most of my "app to do Y" ends up just being a Jupyter notebook that I hacked together in less than an hour.


> It's a natural human impulse

I don't know about that. It certainly seems common amongst the programmer crowd. But I've found that there's plenty of people capable of doing the "5 hours of routine work" without ever thinking "If I spend like 4 hours engineering this weird hack, then I might get this done really fast" (and then of course ending up wasting days trying to get it to work)


What character decomposition database is this using.. Its not as straight forward as it might first seem to say what characters are made of what other ones. some projects I forked over the years:

I modified this to use the missed reviews on kanjikoohi at the difficulty factor to define the optimal path to learn the parts of kanji. https://github.com/scriptin/topokanji

This one shows the parts in a sematic web type view. https://github.com/kanji-graph/kanji-graph


> What character decomposition database is this using..

From the about page, it seems that it's using Wikimedia data: https://commons.wikimedia.org/wiki/Commons:Chinese_character... .

To add to your comment, there's also RADKFILE/KRADFILE, which is used by a lot of Japanese dictionaries out there (including jisho.org), and also IDS (Ideographic Description Sequence) data: https://github.com/cjkvi/cjkvi-ids . The latter, I believe, is not meant for general lookup, but nonetheless can be quite informative, such as identifying semantic/phonetic components.


I'm also guilty of this. Related to this thread, some years ago I made a puzzle game with Chinese characters where you get a bunch of components and try to combine them to as many characters as possible: http://www.jiong3.com/pinzi/

It's not just yak shaving though, some programming skills can be really useful when dealing with vocab lists etc..


Guilty too. I had this idea to improve the order in which I should study kanjis. Ended up with a neo4j instance with kanjis, kanji components (not just radicals), words, and how they are combined.

Now I'm still using the same algorithm, but I do everything manually. It takes time but I found that the flashcard customization aspect makes the memorization easier.


This would also explain why most of those tools have limited usefulness for learning at an advanced level. Anki and Takoboto (for keeping track of words I want to learn) are really all I use right now.


Except me. I don't like making tools, and I am a software engineer, ironically.


Yeah, I saw the headline thinking this is just a feature every Japanese dictionary offers. Surely I must be mistaken, because there's no way yet another person remade this feature yet again.

But no. It's literally one of the basic features of every kanji/hanzi dictionary.


Isn't this a problem SKIP already solves. Many kanji dictionaries already support it and it's also usable in cases where you don't know what a "part" is called.

https://kanji.sljfaq.org/skip-help.html https://kanji.sljfaq.org/skip.html


SKIP is cool but you have to learn it, whereas this just uses knowledge you already have.


SKIP is about matching the kanji to a pattern and counting the strokes of the two portions. This is more about inputting the kanji's constituent parts themselves.

For example, say we have the kanji 訓. For a SKIP-based lookup, you'd see this as 言|川, and you probably know that's 7 and 3 strokes. Whereas with this approach, you could type in the parts, e.g. いう (backspace) to get 言 and かわ to get 川. A lot faster when kanji have many parts or you're not so sure about the stroke counts. Yes, SKIP would be more helpful if you don't know the parts.


In Chinese you have input methods that do exactly that. In particular Array 40 [1], which map pieces of characters to each key on the keyboard to input them.

Other such methods, such as the more popular Cangjie [2] or Boshiamy [3], mix shape, stroke order and sound.

I suppose no such methods exist to write Japanese?

[1]: https://zh.wikipedia.org/wiki/%E8%A1%8C%E5%88%97%E8%BC%B8%E5...

[2]: https://en.wikipedia.org/wiki/Cangjie_input_method

[3]: https://en.wikipedia.org/wiki/Boshiamy_method


Something similar is even built into many pinyin IMEs. To take the same example OP used, Microsoft Pinyin (the default Chinese input method on Windows computers) will let you input 榎 by typing umuxia.

(Explanation: U puts it into component mode, then mu = 木 and xia = 夏)

It's a bit more finicky than OP's version though because the order is important, uxiamu won't give you any useful results.


In Japanese, character input is basically 100% about readings. As such TFA isn't analogous to an IME, it's just an informational reference for looking up unfamiliar characters. If you wanted to type the character, you'd probably look up its reading and then enter that into your Japanese IME.


FWIW, on Android at least, there's Google手書き入力 which does a pretty fantastic job in my experience. It's good enough that funky stroke order, missing strokes, and full-on cursive still usually Just Work.


I normally use the Kanji draw [1] application which is also surprisingly good at recognizing what I'm trying to input. Not nearly as forgiving as Google's solution [2], which I sometimes have to fallback to, but usually it works if I can at least roughly guess what the official way of drawing a character is and check for inexact matches. Plus it's FOSS.

[1]: https://f-droid.org/en/packages/ch.seto.kanjirecog/

[2]: "Note that this will NOT work - at all - if you don't know basically how to draw kanji. If you just draw something any old way that looks like it, it certainly won't be recognised."


I'm not aware of any methods like that in widespread use for Japanese.

I think in some IMEs you can search kanji by radicals, but you have to type the radicals phonetically first.


That's surprising.

I only learnt Mandarin Chinese for a few months, but I did find the app "Pleco". The most useful parts are the handwriting recognition -- which worked even on my beginner-writing -- and the live OCR.

But there's also an "assemble the character from bits" thing[2], though it relies on you knowing how many strokes are in each component. I use(d) it occasionally when I couldn't use OCR, as it was fairly slow (and inexact) for me to count strokes.

https://www.pleco.com/

[2] https://i.imgur.com/lvm0Uln.png (I only pressed 木. 夏 has 10 strokes, and 榎 is listed further down in the ⑩ section, with 45 other characters, presumably in an order)


It's not surprising when you think about how Japanese is written. A typical Japanese sentence will have a few Kanji and a lot of phonetic Kana:

日本語はちょっと書きにくい。

It would make no sense to switch between phonetic input and writing Kanji by pieces, so instead purely phonetic IMEs are used.


There really isn’t a good Japanese counter part to Pleco.

Pleco has said they’ve looked into producing an app for Japanese, but apparently all the good dictionaries and handwriting engines are locked up in exclusive deals with publishers and denshi Jisho.

We are starting to see handwriting engines that are good and not in exclusive licenses, but the technology to for it has existed for decades.


Further to this I use Pleco for chinese and not sure any Japanese get close to it.

so far quite successfully use this for Japanese doc for reading abd studying : Nihongo it is expensive but doing ocr a bit it helps a lot of doing my homework. (Not as good as it does do live ocr like Pleco.)

I wonder whether those sharp and Casio ... but they do not do ocr.


It is a surprise. Many input methods does not care about whether you type what but the decomposition into parts, not radicals. Hence why not.

And for such a large table once again it is hard. Should have studied the chinese inout method and use the 26 keys instead of a table.

Anyway also interest the web site and how the Json work ...


This looks cool!

One suggestion - when clicking an element on the top page, you might consider treating that as "add this element to the search string", rather than as "show the search results for this element". The idea being, many users won't offhand know how to get their IME to produce characters like 亅 or 乚 or 儿, so they'll probably click on such elements, intending to then refine their search.

Edit: second piece of feedback - there are a couple of cases where katakana are used to refer to certain radicals, and it would be an easy usability improvement to alias those characters. E.g. ル→儿、ウ→宀 and so on.


Thanks! For the second piece of feedback, I actually already do this for a few characters, like ム. ル is another good candidate. I am not sure if ウ is a bit of a stretch... Note this is also done for radicals with obvious correspondences, like 心草水獣, so you can search 水木 for 淋.

For the first piece of feedback, I have had other people request that, but I am not sure how to balance it with having detailed kanji pages. One thing to note is that while it is less convenient, on the detailed kanji page there is a link to "search this kanji" - I use that when I know a character that contains the part I want, but can't remember the name of the part itself.


ウ is not really a stretch. It's called ウ冠 after all.

BTW, this kind of search is common in 電子辞書. Mine is almost 20 years old and has a 部品読み search.


Ah, right you are - I didn't realize. I'll have to add that.

I had heard of 電子辞書 having this feature, but I've never actually used one. Glad to see that 部品 was the right word, I had been kind of uncertain about that and gotten feedback about it.

https://jp.sharp/sc/eihon/pa660/text/kangi.html


> not sure if ウ is a bit of a stretch

They don't look identical, but the radical is often referred to that way, e.g. when describing a kanji over the phone. Also same thing with ワ and 冖.


Thanks, I'll stick those in!


> many users won't offhand know how to get their IME to produce characters like ... 儿

One of the most common characters in (simplified) Chinese? It's number 192 in this dataset - http://hanzidb.org/character-list/by-frequency?page=2 - among company like 老、门、先、立、比...

Not to mention it literally defines 儿话.


But you won't find 儿 in Japanese character dictionaries. It will come up if you type ひとあし in an IME, but most people wouldn't know that.


I’ve done the same thing few years ago (2015) and wrote a master 1 thesis about the method (my professor didn’t like it). People from my former univ didn’t understand how to make use of it despite the explanations. I still use it from time to time to search complex chu nom.


IMHO it’s far less intuitive in comparison to radicals lookup on jisho.org. Most notably, it lacks stroke count in radicals and found characters.


I'm surprised I couldn't find 亻 here, I understand it can be found under 人 but that's not going to be obvious to a lot of beginners.


Mega thanks for making and posting this. I've been searching for a tool to "destructure" exactly this way... not necessarily down to radicals, but to any part that has meaning by itself.

Thank you for citing your sources too. As others have mentioned, much of the fun of learning is developing the tools, and this thread helps fill a gaping hole in mine.


Reminds me a bit of http://zhongwen.com which also enables radical search/learning http://zhongwen.com/bushou.htm


I think I'm doing something wrong. I typed "ひと" into the input and got no results(even though the first entry matches this query).

Anyway I've been using this: https://kanji.sljfaq.org/mr-old.html

as my daily driver and while it looks kind of dated, it's rock solid.

What I'm still searching for though is a table of jōyō kanji with their uniquely identifying radicals.


It's not a reading->kanji search, it's a part->whole search.

I.e. search for 人 and one of the results will be 囚 since it has 人 in it.


I'm not affiliated with it but used the app for years, so I can recommend Akebi [1] for Android users. It let you do a search by path as well, and it's all local storage with no ads or feature bloat.

[1] https://play.google.com/store/apps/details?id=com.craxic.ake...


Looks like http://kanji.wareya.moe/ ( or even older https://www.chise.org/ids-find ) with a slightly different UI.


Show this to Kanjiklub.


Careful, you might get eaten by a rathtar


I have been using jisho.org to great effect, you add parts grouped by number of strokes in the Radicals drop down. It can be quite intuitive but I can't always figure it out.

I see this is using a different approach, so I look forward to trying it out!


My go-to for this is jisho, although it completely fails to be usable on phones.

I've more recently started just using the google IME handwriting input, which is using some ML magic that lets it be extremely fuzzy about stroke order/count, relative placement, etc and I think must incorporate some sort of image/bitmap based recognition that many ones do not. E.g. you can I imagine it wouldn't be as useful for chinese, but for Japanese is works quite well.

You can use it on android and on the google translate website.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: