I have seen the term "trigram" twice today on HN. I am familiar with the term only for something like on the South Korean flag [1]. Interesting to see this being used for a special case of n-grams in language processing.
It's not really special. It used to be quite common, as well as bigrams (n=2). In the Roman alphabet, 3 characters give a fair trade-off between precision and recall, making them suitable to things like spelling error correction.
They also give you a good performance. I'm a maintainer of a specialised open-source 3gram database, and there's a lot of performance tricks you can do with 3grams. There are around 2*24 different 3grams (less if you only care about printable characters). This is small enough to pack into a single array on a modern computer. For example the file format I use has a header that reserves 8 bytes of data [1] for every possible 3gram - that's just 128MB of overhead per index, not a lot of disk storage. For 4grams, that's 32GB, much less acceptable even now. And 2grams are just not very useful for large datasets.
[1] for the curious - the header is basically uint64_t file_offsets_to_3gram_data[2**24].
Much of google search ranking is based on (well, was ten years ago) on bigram and trigram statistics. Short of a real n-gram model this is remarkably effective but has some obvious failure modes.
Postgres implements this[0] as well, and it's really wonderful.
It doesn't give a human the search experience they are used to, but for the superhuman who can write regex , this becomes a very cheap way to search data at scale.
I use trigram indices on a project I run[0] where I want to do cheap filtering of DB results and the performance is just outstanding; I didn't think free text search could be so fast!
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX IF NOT EXISTS lowercase_title ON streams (lower(title));
CREATE INDEX IF NOT EXISTS title_trgm ON streams USING gin (lower(title) gin_trgm_ops);
And boom, super performant search via `LIKE %{}%`.
I also love taking advantage of `TABLESAMPLE system_rows()` which lets me do hyperfast random selection without needing to randomly sort the entire table. PG has so many hidden gems.
https://codesearch.debian.net/
(If you want a lot more details, check out my thesis on it: https://codesearch.debian.net/research/bsc-thesis.pdf)