At this point I really have to wonder how many people have written graph engines on top of lmdb! In the past month I've seen two from the same bank built on top of lmdbjava. Instead of reinventing this same wheel over and over it'd probably make sense for somebody to sitdown with lmdb and tinkerpop [1] and bang out one decent implementation.
...actually this has been done [2] but the project looks abandoned. So NSA guys you should get right on this.
At least, this one is in Python. My dream would be to have a transactional graph database written in Python which is used as back-end for NetworkX (https://networkx.github.io/).
Yes, it's like Neo4j but based on the in-memory database LMDB, presumably to provide high performance running on a single machine.
The link mentions that its use case is streaming seed set expansion, which allows you to identify communities based on a set of seeds. I wrote more about that in this comment:
https://news.ycombinator.com/item?id=17335873
for laymen like me: what is this, and what are the perfect use-cases for this?
"..log-based transactional graph (nodes/edges/properties).. ..primary use case is to support streaming seed set expansion." -- I'm totally lost.
I know these kind of software is targeted at developers, but it won't hurt to give analogy like "Uber for XXX" like in startup pitches. e.g. "It's like <put popular product name here e.g. MySQL> but <differentiating factors>".
Graph databases are a really neat thing that liberate you from the need to figure out your database schema at the outset and also allow much faster searching than traditional table-based queries across huge datasets. They're ideal for sparse data or for collections of data whose structure/relationships you're not sure about, and also allow very fast searches because the number of steps between different nodes typically grows more slowly than the number of records between different table entries.
There are a bunch of them on the market, Neo4j is probably the most popular (and has lots of good quality introductory text on the website and on youtube). Graph databases are key to many major internet services, eg Google, Twitter, and Facebook are all just really big graph databases.
This particular graph database stores all its data in a single file trading speed and simplicity off against flexibility. I'm not an expert but it seems like it would work very well for search queries, but poorly for tasks involving a lot of contributors like a chat server.
Based on the description on the site, the NSA uses this as a way to identify communities based on their communication patterns. This is a kind of social network analysis:
https://en.wikipedia.org/wiki/Social_network_analysis
The particular use case mentioned for LemonGraph is "streaming seed set expansion", which, given a set of "seeds" can expand that set based on their communication patterns to find people that are likely in the same community or overlapping communities.
E.g., if you have a few known terrorists and a database of metadata about phone calls or internet communications (emails etc.), you can analyze who your known subjects (seeds) talk to and who those people talk to, etc., to identify communities. This relies on the fact that there tends to be cross-communication between people in the same community.
This kind of analysis can often reveal the structure of communities, like where the headquarters, who the boss is, etc.
In industry and law enforcement, the same kinds of approaches can be used to identify fraud of various kinds.
"What it's like" is a graph database, in this case based on a fast in-memory database to support high speed graph analysis on a single machine.
In this case, a "graph" is an organized way to store and represent data, especially in cases where you want to store relationships (edges) between different things (nodes).
Streaming is just referring to its ability to be constantly updated by incoming data.
So it's like a database but for a much more narrow use case and can perform much better in those cases.
...actually this has been done [2] but the project looks abandoned. So NSA guys you should get right on this.
[1] http://tinkerpop.apache.org/docs/current/reference/
[2] https://github.com/pietermartin/thundergraph