Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Slightly off-topic, but does anyone know why this works on macOS?

  $ echo "foo" >"ß.txt"
  $ cat "ss.txt"
  foo


Mac OS’ default filesystems, HFS+ and APFS, historically have been case-insensitive, although case-sensitive variants exist. In the Unicode database the uppercase variant of ‘ß’ is recorded as upper('ß') → 'SS', according to most recent and common usage. A capital ẞ exists, but is rather new. One can assume that your filesystem does their filename comparisons and possible storage with `lower(upper(filename))` or such.

Another pitfall of Mac filesystems are Unicode normalizations of precomposed characters, which changed between HFS+ and APFS I think.


> One can assume that your filesystem does their filename comparisons and possible storage with `lower(upper(filename))` or such.

The correct way to compare Unicode strings case-insensitively involves "case-folding" which directly maps "ß" to "ss".


Gotcha! The crucial piece I was missing was that Unicode case-folding can turn a single codepoint into many.

One more thing to the list of falsehoods programmers believe about Unicode I guess :-)


In German you can write 'ss' for 'ß'. So 'spaß' (fun) becomes 'spass' .

And the 'e' can be used to encode a diaeresis. For example 'spät' (late) becomes spaet'.

It makes sense to apply this encoding for filenames, since not all software may support Unicode filenames. The file may one day be transfered to anorher OS, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: