Kanji Club: Search Kanji by Parts with Instant Feedback

wirthjason · on March 29, 2021

Probably 80% of Japanese students are more interested in making tools to study Japanese than actually studying Japanese. It’s a kind of yak shaving. I know this applies to me. (Which is why my Nihongo is crap.) I’m not sure if this happens with other languages, East Asian or not.

wk_end · on March 29, 2021

Not totally surprising. It's a natural human impulse (maybe?) to want to build or improve your tools when working on something, especially when it feels like your tools are inadequate. And while there are good resources for learning Japanese, they're few and far between in my experience. And nothing, so far as I know, is completely comprehensive on its own.

Personally, I've never gone ahead and made my own Japanese learning app or anything like that, but while studying I do pretty regularly think, "man, this would be so much better if it did X; I wish I had an app to do Y", etc., etc.

Because of Western attitudes towards Japanese pop culture, a lot of people learning Japanese are...well, they're gonna be kinda nerdy, frankly. And that means they're more likely than average to have the sort of skills that allow them to build (software) tools to help them study.

Throw in that learning Japanese, at least for Westerners, is a monumental task mostly driven by intellectual curiosity (rather than for, say, economic opportunities), it'll attract the sort of ambitious and intellectually curious people that'll actually endeavour to make their own tools, with the necessary skills to do it.

luigi23 · on March 29, 2021

> Because of Western attitudes towards Japanese pop culture, a lot of people learning Japanese are...well, they're gonna be kinda nerdy, frankly. And that means they're more likely than average to have the sort of skills that allow them to build (software) tools to help them study.

Whoa, you nailed it here. Never thought of it this way.

mumblemumble · on March 29, 2021

> while studying I do pretty regularly think, "man, this would be so much better if it did X; I wish I had an app to do Y", etc., etc.

This is more or less exactly why, despite having played with a lot of different tools, I always come back to Anki. The UI is irritatingly unpolished, and it's difficult to learn well, but it's also the only tool out there that's open and flexible enough that I can easily experiment with and tweak what I'm doing without disrupting my overall learning process overmuch.

I've certainly nerdsniped myself a few times, especially when I'm trying to build something from the ground up because I'm currently toying with some other app. With Anki, though, most of my "app to do Y" ends up just being a Jupyter notebook that I hacked together in less than an hour.

Bigpet · on March 29, 2021

> It's a natural human impulse

I don't know about that. It certainly seems common amongst the programmer crowd. But I've found that there's plenty of people capable of doing the "5 hours of routine work" without ever thinking "If I spend like 4 hours engineering this weird hack, then I might get this done really fast" (and then of course ending up wasting days trying to get it to work)

totetsu · on March 29, 2021

What character decomposition database is this using.. Its not as straight forward as it might first seem to say what characters are made of what other ones. some projects I forked over the years:

I modified this to use the missed reviews on kanjikoohi at the difficulty factor to define the optimal path to learn the parts of kanji. https://github.com/scriptin/topokanji

This one shows the parts in a sematic web type view. https://github.com/kanji-graph/kanji-graph

arata · on March 29, 2021

> What character decomposition database is this using..

From the about page, it seems that it's using Wikimedia data: https://commons.wikimedia.org/wiki/Commons:Chinese_character... .

To add to your comment, there's also RADKFILE/KRADFILE, which is used by a lot of Japanese dictionaries out there (including jisho.org), and also IDS (Ideographic Description Sequence) data: https://github.com/cjkvi/cjkvi-ids . The latter, I believe, is not meant for general lookup, but nonetheless can be quite informative, such as identifying semantic/phonetic components.

wibr · on March 29, 2021

I'm also guilty of this. Related to this thread, some years ago I made a puzzle game with Chinese characters where you get a bunch of components and try to combine them to as many characters as possible: http://www.jiong3.com/pinzi/

It's not just yak shaving though, some programming skills can be really useful when dealing with vocab lists etc..

kenoph · on March 29, 2021

Guilty too. I had this idea to improve the order in which I should study kanjis. Ended up with a neo4j instance with kanjis, kanji components (not just radicals), words, and how they are combined.

Now I'm still using the same algorithm, but I do everything manually. It takes time but I found that the flashcard customization aspect makes the memorization easier.

zaik · on March 29, 2021

This would also explain why most of those tools have limited usefulness for learning at an advanced level. Anki and Takoboto (for keeping track of words I want to learn) are really all I use right now.

_nckn · on March 29, 2021

Except me. I don't like making tools, and I am a software engineer, ironically.

whatastory · on March 29, 2021

Yeah, I saw the headline thinking this is just a feature every Japanese dictionary offers. Surely I must be mistaken, because there's no way yet another person remade this feature yet again.

But no. It's literally one of the basic features of every kanji/hanzi dictionary.

redrobein · on March 29, 2021

Isn't this a problem SKIP already solves. Many kanji dictionaries already support it and it's also usable in cases where you don't know what a "part" is called.

https://kanji.sljfaq.org/skip-help.html https://kanji.sljfaq.org/skip.html

polm23 · on March 29, 2021

SKIP is cool but you have to learn it, whereas this just uses knowledge you already have.

cooper12 · on March 29, 2021

SKIP is about matching the kanji to a pattern and counting the strokes of the two portions. This is more about inputting the kanji's constituent parts themselves.

For example, say we have the kanji 訓. For a SKIP-based lookup, you'd see this as 言|川, and you probably know that's 7 and 3 strokes. Whereas with this approach, you could type in the parts, e.g. いう (backspace) to get 言 and かわ to get 川. A lot faster when kanji have many parts or you're not so sure about the stroke counts. Yes, SKIP would be more helpful if you don't know the parts.

jiehong · on March 29, 2021

In Chinese you have input methods that do exactly that. In particular Array 40 [1], which map pieces of characters to each key on the keyboard to input them.

Other such methods, such as the more popular Cangjie [2] or Boshiamy [3], mix shape, stroke order and sound.

I suppose no such methods exist to write Japanese?

[1]: https://zh.wikipedia.org/wiki/%E8%A1%8C%E5%88%97%E8%BC%B8%E5...

[2]: https://en.wikipedia.org/wiki/Cangjie_input_method

[3]: https://en.wikipedia.org/wiki/Boshiamy_method

kevin_p · on March 29, 2021

Something similar is even built into many pinyin IMEs. To take the same example OP used, Microsoft Pinyin (the default Chinese input method on Windows computers) will let you input 榎 by typing umuxia.

(Explanation: U puts it into component mode, then mu = 木 and xia = 夏)

It's a bit more finicky than OP's version though because the order is important, uxiamu won't give you any useful results.

fenomas · on March 29, 2021

In Japanese, character input is basically 100% about readings. As such TFA isn't analogous to an IME, it's just an informational reference for looking up unfamiliar characters. If you wanted to type the character, you'd probably look up its reading and then enter that into your Japanese IME.

xelxebar · on March 29, 2021

FWIW, on Android at least, there's Google手書き入力 which does a pretty fantastic job in my experience. It's good enough that funky stroke order, missing strokes, and full-on cursive still usually Just Work.

akalsz · on March 29, 2021

I normally use the Kanji draw [1] application which is also surprisingly good at recognizing what I'm trying to input. Not nearly as forgiving as Google's solution [2], which I sometimes have to fallback to, but usually it works if I can at least roughly guess what the official way of drawing a character is and check for inexact matches. Plus it's FOSS.

[1]: https://f-droid.org/en/packages/ch.seto.kanjirecog/

[2]: "Note that this will NOT work - at all - if you don't know basically how to draw kanji. If you just draw something any old way that looks like it, it certainly won't be recognised."

polm23 · on March 29, 2021

I'm not aware of any methods like that in widespread use for Japanese.

I think in some IMEs you can search kanji by radicals, but you have to type the radicals phonetically first.

Symbiote · on March 29, 2021

That's surprising.

I only learnt Mandarin Chinese for a few months, but I did find the app "Pleco". The most useful parts are the handwriting recognition -- which worked even on my beginner-writing -- and the live OCR.

But there's also an "assemble the character from bits" thing[2], though it relies on you knowing how many strokes are in each component. I use(d) it occasionally when I couldn't use OCR, as it was fairly slow (and inexact) for me to count strokes.

https://www.pleco.com/

[2] https://i.imgur.com/lvm0Uln.png (I only pressed 木. 夏 has 10 strokes, and 榎 is listed further down in the ⑩ section, with 45 other characters, presumably in an order)

betterunix2 · on March 29, 2021

It's not surprising when you think about how Japanese is written. A typical Japanese sentence will have a few Kanji and a lot of phonetic Kana:

日本語はちょっと書きにくい。

It would make no sense to switch between phonetic input and writing Kanji by pieces, so instead purely phonetic IMEs are used.

wodenokoto · on March 29, 2021

There really isn’t a good Japanese counter part to Pleco.

Pleco has said they’ve looked into producing an app for Japanese, but apparently all the good dictionaries and handwriting engines are locked up in exclusive deals with publishers and denshi Jisho.

We are starting to see handwriting engines that are good and not in exclusive licenses, but the technology to for it has existed for decades.

ngcc_hk · on March 29, 2021

Further to this I use Pleco for chinese and not sure any Japanese get close to it.

so far quite successfully use this for Japanese doc for reading abd studying : Nihongo it is expensive but doing ocr a bit it helps a lot of doing my homework. (Not as good as it does do live ocr like Pleco.)

I wonder whether those sharp and Casio ... but they do not do ocr.

ngcc_hk · on March 29, 2021

It is a surprise. Many input methods does not care about whether you type what but the decomposition into parts, not radicals. Hence why not.

And for such a large table once again it is hard. Should have studied the chinese inout method and use the 26 keys instead of a table.

Anyway also interest the web site and how the Json work ...

fenomas · on March 29, 2021

This looks cool!

One suggestion - when clicking an element on the top page, you might consider treating that as "add this element to the search string", rather than as "show the search results for this element". The idea being, many users won't offhand know how to get their IME to produce characters like 亅 or 乚 or 儿, so they'll probably click on such elements, intending to then refine their search.

Edit: second piece of feedback - there are a couple of cases where katakana are used to refer to certain radicals, and it would be an easy usability improvement to alias those characters. E.g. ル→儿、ウ→宀 and so on.

polm23 · on March 29, 2021

Thanks! For the second piece of feedback, I actually already do this for a few characters, like ム. ル is another good candidate. I am not sure if ウ is a bit of a stretch... Note this is also done for radicals with obvious correspondences, like 心草水獣, so you can search 水木 for 淋.

For the first piece of feedback, I have had other people request that, but I am not sure how to balance it with having detailed kanji pages. One thing to note is that while it is less convenient, on the detailed kanji page there is a link to "search this kanji" - I use that when I know a character that contains the part I want, but can't remember the name of the part itself.

glandium · on March 29, 2021

ウ is not really a stretch. It's called ウ冠 after all.

BTW, this kind of search is common in 電子辞書. Mine is almost 20 years old and has a 部品読み search.

polm23 · on March 29, 2021

Ah, right you are - I didn't realize. I'll have to add that.

I had heard of 電子辞書 having this feature, but I've never actually used one. Glad to see that 部品 was the right word, I had been kind of uncertain about that and gotten feedback about it.

https://jp.sharp/sc/eihon/pa660/text/kangi.html

fenomas · on March 29, 2021

> not sure if ウ is a bit of a stretch

They don't look identical, but the radical is often referred to that way, e.g. when describing a kanji over the phone. Also same thing with ワ and 冖.

polm23 · on March 29, 2021

Thanks, I'll stick those in!

thaumasiotes · on March 29, 2021

> many users won't offhand know how to get their IME to produce characters like ... 儿

One of the most common characters in (simplified) Chinese? It's number 192 in this dataset - http://hanzidb.org/character-list/by-frequency?page=2 - among company like 老、门、先、立、比...

Not to mention it literally defines 儿话.

wtn · on March 29, 2021

But you won't find 儿 in Japanese character dictionaries. It will come up if you type ひとあし in an IME, but most people wouldn't know that.

tasogare · on March 29, 2021

I’ve done the same thing few years ago (2015) and wrote a master 1 thesis about the method (my professor didn’t like it). People from my former univ didn’t understand how to make use of it despite the explanations. I still use it from time to time to search complex chu nom.

rasguanabana · on March 29, 2021

IMHO it’s far less intuitive in comparison to radicals lookup on jisho.org. Most notably, it lacks stroke count in radicals and found characters.

Timpy · on March 29, 2021

I'm surprised I couldn't find 亻 here, I understand it can be found under 人 but that's not going to be obvious to a lot of beginners.

teye · on March 29, 2021

Mega thanks for making and posting this. I've been searching for a tool to "destructure" exactly this way... not necessarily down to radicals, but to any part that has meaning by itself.

Thank you for citing your sources too. As others have mentioned, much of the fun of learning is developing the tools, and this thread helps fill a gaping hole in mine.

hawflakes · on March 29, 2021

Reminds me a bit of http://zhongwen.com which also enables radical search/learning http://zhongwen.com/bushou.htm

Tade0 · on March 29, 2021

I think I'm doing something wrong. I typed "ひと" into the input and got no results(even though the first entry matches this query).

Anyway I've been using this: https://kanji.sljfaq.org/mr-old.html

as my daily driver and while it looks kind of dated, it's rock solid.

What I'm still searching for though is a table of jōyō kanji with their uniquely identifying radicals.

fenomas · on March 29, 2021

It's not a reading->kanji search, it's a part->whole search.

I.e. search for 人 and one of the results will be 囚 since it has 人 in it.

m3at · on March 29, 2021

I'm not affiliated with it but used the app for years, so I can recommend Akebi [1] for Android users. It let you do a search by path as well, and it's all local storage with no ads or feature bloat.

[1] https://play.google.com/store/apps/details?id=com.craxic.ake...

NDICjQ2zlm5vJ6S · on March 29, 2021

Looks like http://kanji.wareya.moe/ ( or even older https://www.chise.org/ids-find ) with a slightly different UI.

anshumankmr · on March 29, 2021

Show this to Kanjiklub.

Metacelsus · on March 29, 2021

Careful, you might get eaten by a rathtar

ehnto · on March 29, 2021

I have been using jisho.org to great effect, you add parts grouped by number of strokes in the Radicals drop down. It can be quite intuitive but I can't always figure it out.

I see this is using a different approach, so I look forward to trying it out!

EE84M3i · on March 29, 2021

My go-to for this is jisho, although it completely fails to be usable on phones.

I've more recently started just using the google IME handwriting input, which is using some ML magic that lets it be extremely fuzzy about stroke order/count, relative placement, etc and I think must incorporate some sort of image/bitmap based recognition that many ones do not. E.g. you can I imagine it wouldn't be as useful for chinese, but for Japanese is works quite well.

You can use it on android and on the google translate website.