FWIW. This would probably present a challenge in most (all?) languages.
For example in libc++ due to SSO an std::string has a minimum size of 24 bytes.
For a billion strings less than 15 chars (+ the null byte) that gets you to 24GB, and that’s optimistically assuming each string is allocated in place.
I doubt heap allocated char* would do much better either. Just having a billion 8 byte pointers eats a lot of memory. You’d really need some sort of string packing scheme similar to what you did in Java.
It's a lot easier to build custom allocators in C++ though.
For one, Java has a maximum mmap-size of 2 Gb, and as a cherry on top of that turd, you have no control over their lifecycle. The language is very clearly not designed for this type of work, and if you try to make it do it anyway, it fights you every step of the way.
It has nothing to do with the language but with the API / VM capabilities.
Both those restrictions are fixed, granted that this is not yet in the offical API, it's an incubation module (the equivalent of from __future__ of Python).
Right - specifically they invented a way to make closing an mmapped segment safe and fast. The reason you can't officially (without private APIs) unmap something in current Java is because if you did then other threads would segfault, and "no segfaults" is kind a defining characteristic of Java. The new API fixes this using more VM magic, so closing a mapping from one thread will cause other threads to throw exceptions, but this is done without sacrificing performance (it doesn't require checking a status every time you read from memory).
For example in libc++ due to SSO an std::string has a minimum size of 24 bytes.
For a billion strings less than 15 chars (+ the null byte) that gets you to 24GB, and that’s optimistically assuming each string is allocated in place.
I doubt heap allocated char* would do much better either. Just having a billion 8 byte pointers eats a lot of memory. You’d really need some sort of string packing scheme similar to what you did in Java.