Why using a self-made lisp here ? Why not using an existing one ? Is it because ...

stinkypete · on Nov 26, 2012

Probably has something to do with this statement, repeated many times on the page:

  We are able to do this, because starting our LISP-VM and doing all these processes on every
  request, is still many times faster that having a VM (with a garbage collector) online all the time.

Self-made LISPs usually aren't that complicated to write, and if you're taking shortcuts like preallocating chunks of memory rather than writing a full-blown VM GC system, then things become even easier. My guess is that most of the LISP code doing the data analysis is composed of calls into primitives written in C that do all the heavy lifting so the performance hit of an interpreted or bytecode-interpreted language is minimal.

Having to maintain a small LISP interpreter in C is definitely an extra thing to keep up with, but you have to balance that with the work that would normally go into avoiding long GC pauses once you start tracking a nontrivial amount of objects/data in a VM with garbage collection. Java, for instance, doesn't support separate heaps, which means that a background piggy processing thread that is haphazardly allocating objects can cause "core" threads doing I/O to pause for several seconds for full GCs. Even with Java's fairly sophisticated heap management schemes it is still very difficult to design a system to completely avoid full GC pauses. Erlang is somewhat better in this respect but introduces its own sets of problems to the mix.

In any case, I agree with your sentiment (probably would not be the angle I would have taken), but I can kind of see why the authors might have decided to go this route.

edit: formatting fixes

jonromero · on Nov 26, 2012

Also, creating a HIVE/SQLish language on top of our LISP was super easy.

bsaul · on Nov 26, 2012

That's for the querying part, but how did you deal with indexing ? If if understand right, you seem to precompute pretty much everything you need on the fly, so I guess that means custom data structure not relying on SQL algebra, so no pkey / fkey table like and index that would let you write new queries after you've stored the data ?

chongli · on Nov 26, 2012

Greenspun's tenth rule in action!

bsaul · on Nov 26, 2012

Since garbage collecting seems such a great issue, does anyone knows of an effort to have an objective-C-like language with automatic reference counting and memory retain/release (like with clang) on the server side ?

PS : i'm speaking objective-c here because it's the only language i know that does it that way, not because of its features as a language.

krichman · on Nov 27, 2012

I'm pretty sure CPython reference counts objects. Also Objective-C -- as in GNUStep, if you aren't using a Mac server.

EDIT: There's a downside to reference counting in that either you or the runtime does the retain/release calls very often, whereas a mark-and-sweep collector would just touch each object once per collection. So reference counting might not be faster overall, but typically avoids long pauses. What you probably really want is something like Azul's JVM, which has a kernel extension so they can collect memory concurrently, resulting in shorter pauses than sweeping and faster overall time than reference counting.

dgk42 · on Nov 27, 2012

We employ a technique similar to that of Obj-C in our memory handler implementation. You attach objects not to a parent object, but to a generation. We have 4 a-priori-specified generations per node (2 of them are stubborn and die during the de-initialization/clean-up phase of the node as a whole).

And this is another area where a functional paradigm makes sense. Mutations are "hidden" (and the C layer that's responsible for doing them concurrently lives in another "realm" (monad)). So, memory handling is easy and monitoring has shown us that fragmentation is kept to a minimum.

jonromero · on Nov 26, 2012

Yeap, the problem was the VM. And we are going to opensource most of it.

We tried to find something that was fast, cheap to scale (in terms of servers) and easy to extend but we couldn't find any other solutions. The problem is having everything in memory and doing correlations really fast.

lucian1900 · on Nov 26, 2012

S-expressions are a nice representation for ASTs in general. At work we needed a query language for an API, so we used the data structures we had (JSON-ish) to express a little Lisp and then wrote a parser from an SQL-ish language to this Lisp.

novocaine7 · on Nov 26, 2012

its not just a custom lisp (which i wouldnt be too worried about) - its a custom db.

i wonder how much engineering talent is going to get sunk into writing a db and whether the management is going to get fidgety while the company's best talent is writing a db rather than the product? (especially with .. two guys)