Lampsort: Leslie Lamport's Non-recursive Quicksort

lisper · on Dec 7, 2014

TL;DR: if you have an algorithm that makes two recursive calls and it doesn't matter what order you do them in (e.g. Quicksort) then you can rewrite that algorithm to use a set of sub-problems in stead of a stack of sub-problems. The latter is what you (implicitly) get when you write an algorithm recursively.

A much more interesting observation would have been to note that you could accomplish the same thing by introducing a new language construct that explicitly noted that the order of two calls doesn't matter. That would allow a compiler to automatically produce code that used a set-of-intervals data structure, or -- much more relevant in today's world -- code that used multiple cores. (But that would have defeated the real purpose of the article, which was to be a tacit plug for Eiffel, which doesn't have such a construct.)

sorbits · on Dec 8, 2014

> A much more interesting observation would have been to note that you could accomplish the same thing by introducing a new language construct that explicitly noted that the order of two calls doesn't matter

This would be the case for a pure function: http://en.wikipedia.org/wiki/Pure_function

One of the selling points of pure functional languages is the potential optimizations the compiler can theoretically do, including using multiple cores to evaluate a program, without any explicit hints from the programmer.

fiatmoney · on Dec 8, 2014

"you could accomplish the same thing by introducing a new language construct that explicitly noted that the order of two calls doesn't matter"

Effectively, this is what Golang's "go" construct does. Combined with channels you can construct an arbitrary dependency graph as well.

pacala · on Dec 8, 2014

A very common need is to do an bunch of independent RPCs. Small wish: Allow returning values from a goroutine, along the lines of "results <- go rpc(args)".

theoh · on Dec 7, 2014

That's a good observation. It should go without saying that recursion can always be expressed iteratively. Lamport is inclined to make observations like this, e.g. his sequential consistency concept which states that concurrency is always equivalent to a serial process. But don't you mean "multiple recursive calls", not specifically two?

lisper · on Dec 8, 2014

> But don't you mean "multiple recursive calls", not specifically two?

Yes, of course.

Exercise for the non-lazy lurker: can you generalize the Quicksort algorithm based on this? Is there any benefit to doing so?

comex · on Dec 8, 2014

You can if the partition uses new storage rather than swapping in-place (which is likely what you want anyway for horizontal parallelism, see below): pick multiple pivots and insert elements into a bucket depending on how many pivots they're greater than. This turns out to be called samplesort, with some extra detail regarding picking the buckets.

However, once you give up on in-place you should probably use radix sort instead, as long as your elements, or their hashes, have some semblance of uniform distribution: this avoids having to calculate greater-than on a large number of pivots, although depending on the architecture it may be possible to do the latter reasonably quickly, and I suppose you may end up saturating memory bandwidth anyway...

Anyway, you can run a single partition in parallel - just not in place - by dividing the input data and having separate output queues per core. You probably need to merge the buckets back into the original array anyway, so this doesn't hurt; even if you want to keep the buckets separated (but individually contiguous) for some reason, e.g. if you don't need the final output as an array and want to save some difficult-to-parallelize merging, it just makes the partition a bit trickier with some atomic stuff.

theoh · on Dec 8, 2014

Since sorting is inherently a one-dimensional activity, I doubt it. Philosophically speaking, binary cuts are associated with qualitative phenomena, hot and cold being one of the prototypes (light and dark is another). Bergson or Deleuze elevate this kind of thing to a metaphysical level but I think there's an underlying simplistic aspect to it. Someone gives you a line, there's not much you can do but split it into two parts.

lisper · on Dec 8, 2014

> Since sorting is inherently a one-dimensional activity, I doubt it.

Imagine you had 100 billion items to sort and 10,000 cores to put to work. Would that change your answer?

jwmerrill · on Dec 7, 2014

> His trick is to ask the audience to give a non-recursive version of Quicksort, and of course everyone starts trying to remove the recursion, for example by making the stack explicit or looking for invertible functions in calls. But his point is that recursion is not at all fundamental in Quicksort.

I'm not completely sure what distinction Meyer is trying to draw between the implementation he discusses, and "making the stack explicit". He ends up with a "set" of intervals that he pushes onto and pops from, which might as well have been a stack of arguments.

Is the point that the order you pull things out of the set doesn't matter, so demanding that it behave as a stack is overly prescriptive?

Cushman · on Dec 7, 2014

Watching the source lecture[0], it's not about how the code is written so much as how the function is specified. It's easier to describe quicksort as a few set equations without invoking the concept of recursion, but since we don't write code like that people can't get to it easily.

The nuts-and-bolts difference is in fact that it's a set and not a stack, but the underlying question is, why do we all treat the set implementation as the derived version?

[0] http://channel9.msdn.com/Events/Build/2014/3-642 ~35m

TheLoneWolfling · on Dec 7, 2014

> since we don't write code like that people can't get to it easily.

I've lately been trying to write code with no recursion - as in able to be implemented with each function having a variable indicating where to return to, no implicit stack in sight. (Statically-determinable stack size, to put it another way.)

It took me a bit to get the hang of, but I'd argue that it ends up being easier in the long run. Far easier to change the priority operator on your queue than trying to reason what behavior different orders of recursion get you, for example.

Ditto with graph search algorithms. It's enlightening to teach graph search algorithms as a single algorithm with a queue, where changing the priority gives you Dijkstra's algorithm (least cost first), DFS (LIFO), BFS (FIFO), A* (current cost + underestimating huristic of cost left), or whatever. Also easy to show that A* degenerates into Dijkstra's algorithm when you use a null heuristic (i.e. a heuristic that returns a constant) if you teach them as variants of the same algorithm.

badubu · on Dec 7, 2014

> but since we don't write code like that people can't get to it easily

which likely sums up what Lamport is trying to get across - express your work equationally, determine properties and generally understand what you're dealing with and then later, you can 'compile your equations' down to a particular implementation. from my own practice, i've found that it is much easier to reason about equations and reflect it down into code than to try and reason about the code

bkirwi · on Dec 7, 2014

Yes, that's the principal idea, I think -- separating out the implementation details from the actual conditions that are necessary for correctness. IIRC, I've seen another presentation of this where the author used that freedom to implement a work-stealing quicksort across multiple threads.

lambda · on Dec 7, 2014

  Is the point that the order you pull things out of the
  set doesn't matter, so demanding that it behave as a 
  stack is overly prescriptive?

Right. In fact, there are several parts of the algorithm that are not really fundamental; one is how you pick the pivot, and another is how you perform the partitioning (ensuring that all elements before the pivot are less and all elements after are greater).

Lamport's point is that if you look at the fundamentals of the quicksort algorithm, which are that you must pick a pivot, partition such that smaller values come before and larger values come after, and then at some point later apply the same two steps to the two intervals [start, pivot) and (pivot, end].

Any algorithm that follows that specification will correctly sort the array, and then you can tweak exactly how you do that depending on your constraints. You could do it recursively, you could do it iteratively, you could divide it up into separate threads until you have one thread per processor each of which does the recursive or iterative algorithm on a private work list, you could do a worker pool in which each worker takes intervals from the set of remaining intervals to sort, etc. And likewise, you can tweak how you pick the pivot (picking the first element in the array is usually a bad idea, as it gives worst case behavior for already sorted arrays), and tweak how you perform the partitioning.

Anyhow, he didn't go into this kind of detail, he just pointed out that thinking about the fundamental specification can help get you away from the details of a particular implementation.

TheLoneWolfling · on Dec 7, 2014

I think so.

Of course, you can make the same point about many other algorithms - graph search, for example.

The nice thing about showing that arbitrary orderings still work is that you can, for example, do parallel sorting with work-stealing, where each thread tries to grab the smallest interval from its working set at each iteration (to minimize space requirements and to improve cache locality), but other threads try to grab the largest interval possible (to reduce the amount of locking as much as possible).

Cushman · on Dec 7, 2014

If I understand the argument, the use of a set rather than a stack encodes the observation that the order of execution of adjacent partition operations isn't important. That ordering is information you don't have to store, meaning you can choose any set-like data structure, and being able to choose your access is theoretically useful: For example, by following a rule like "partition the smallest interval first", you could keep the set size ("depth") to a minimum. (That might be handy due to random pivot choice?) You don't have the freedom to choose that in a stack-backed implementation, whether recursive or "non".

So it is reasonable to argue this is a more abstract, "fundamental" implementation of the sort. Intuitively, I'm not sure it's that different from "remov[ing] the recursion, for example by making the stack explicit", then removing the stack, but it's sure interesting.

Edit: See https://news.ycombinator.com/item?id=8713160 for more context; this is correct, but the point is not really about how the code works.

tedunangst · on Dec 7, 2014

"Smallest interval first" is exactly what naive recursion gives you.

TheLoneWolfling · on Dec 7, 2014

Not quite.

Naive recursion doesn't generally change the order between the two subproblems at a single stage. It's generally just sort(first half), sort(second half).

Think of the case of a really bad pivot as the first choice - say, the second-highest element. Naive recursion generally will recurse on the lower chunk first, whereas smallest chunk first will recurse on the higher chunk.

tedunangst · on Dec 7, 2014

Fair point. I assume you get a reasonable pivot.

Recursing on one of the two halves of the "current" partition is the cache friendliest option. Smallest interval will guarantee this by induction, I think, but I've never heard that the size of the stack is really a problem in quicksort. If you continuously partition out only a single element, your runtime is going to suck, even if you "sort" those elements first.

TheLoneWolfling · on Dec 8, 2014

> I've never heard that the size of the stack is really a problem in quicksort

Naive quicksort tends to, in pathological cases, run out of stack space as opposed to having performance issues.

Smallest-interval ends up with a strict upper limit of O(log n) elements in the running set, as opposed to O(n) for naive quicksort. This can be useful.

rav · on Dec 7, 2014

Where Quicksort uses the recursion stack to maintain the `intervals` implicitly, Lampsort instead uses some set data structure to maintain this set of subproblems explicitly.

It reminds me of the way you turn recursive depth-first search (recurse on children that have not been visited) into non-recursive depth-first search (push the children that have not been visited onto the DFS stack). Calling it "non-recursive" is a bit misleading, since you're trading recursion stack space for an actual data stack to maintain the recursion you are supposed to avoid -- but it is a useful program transformation to know.

taeric · on Dec 7, 2014

There are ways to do a DFS of a tree without any sort of stack, though. This sounds more like a nitpick on stack vs set. You still have a data structure that grows in lg(n).

hyt7u · on Dec 7, 2014

> There are ways to do a DFS of a tree without any sort of stack, though

What do you mean by this? You need a way to say that the unexplored nodes we've most recently found are the ones we explore first. How do you do that (with constant time operations) without what's effectively a stack?

taeric · on Dec 7, 2014

Threaded trees was what I had in mind.[1] I believe the example there still talks of checking a visited list, though. Either a misunderstanding on my part, or on that page. I'll have to check on my books. Later, sadly. :)

[1] http://en.wikipedia.org/wiki/Threaded_binary_tree

Edit: Apologies for a quick edit. I remembered I had a short implementation on my computer. Basically, when you follow a link, you know if it was a regular link or a thread. If it was a regular link, you could still go left from the new node. Otherwise, you visit and go right. Again, keep note of whether it was a thread or not. Repeat until done. (Make sense?)

al2o3cr · on Dec 7, 2014

"Again, keep note of whether it was a thread or not."

This sounds like the visited list again...

taeric · on Dec 8, 2014

You only have to remember the link you just followed. Basically, make a boolean "followedThread" and initialize to false. Then, go to the root and follow this algorithm

    if (!followedThread && left != null) 
        set followedThread <- curNode.left.isThread
        set curNode <- curNode.left.node
    else
        set followedThread <- curNode.right.isThread
        set curNode <- curNode.right.node

Only node you really have to special case is the root node, and you can do that by making its right.node == null. Then you just do this while curNode != null.

That make sense? (Also... very possible that I made a mistake here...)

hyt7u · on Dec 7, 2014

Ah, great, thanks. I wasn't thinking about modifications to the tree.

taeric · on Dec 7, 2014

Yeah, apologies if it looks like I was intentionally misleading. Was not my intent.

oggy · on Dec 7, 2014

> Formal specification languages look remarkably like programming > languages; to be usable for significant applications they must meet > the same challenges: defining a coherent type system...

Lamport disagreed with this statement before [1], wonder what he'd say nowadays (he's been known to change his mind). Gossip warning: Lamport gave a talk at Meyer's university a few years back, bashing OOP. Meyer's a huge OOP proponent.

[1]: http://research.microsoft.com/en-us/um/people/lamport/pubs/l...

yason · on Dec 7, 2014

I hiss at the term "non-recursive quicksort" which is used a lot. Quicksort per definition is recursive logically. The same computation recurs in the subarrays, you know.

Whether you maintain the state by using the programming language's own stack, a separate stack, a queue, or some other data structure either implicitly or explicitly is a matter of implementation. The amount of space you need to allocate for the program state as a function of the size of the input is the same regardless.

In a high level enough language it doesn't matter because you are to encode, in that language, your intent to apply the same process again and again to the subpartitions until everything is sorted, and a sufficiently smart compiler could choose any implementation that best suits the underlying hardware.

mpweiher · on Dec 7, 2014

"The same computation recurs"

Hmm...you mean like in a loop?

yason · on Dec 8, 2014

Nope.

You can run, for example, bubble sort or insertion sort in a loop, always iterating over the same set of data (with some fixed amount of state) until the array is sorted.

Conversely and per definition, quicksort divides the data flow into subarrays and then sub-subarrays recursively which means that a plain single loop with constant space for state won't do. You need extra space for storing state and the amount of that extra space needed depends on the size of the input; logarithmically, in quicksort's case.

The implementation of recursion can vary but the control and data flow still does recur in a nested fashion.

joelgrus · on Dec 7, 2014

Here's a Python version I quickly threw together, for people like me who don't know Eiffel:

https://gist.github.com/joelgrus/9dc47ebb22243fe990e5

I think it's right. :)

bluecalm · on Dec 7, 2014

I am not sure what's the point is. It's still recursion it just bypasses using frame pointer and simulates its own. Here with set but stack is customary way to do that.

The idea isn't new either. Link to C implementation of quicksort without function calls: http://www.ucw.cz/libucw/#what (simple implementations is in array-simple.h). This implementation is very similar to the one in the article and is way (in my experience often more than 2x) faster than qsort or std::qsort so worth taking a look at.

trainfromkansas · on Dec 7, 2014

I'm not sure I agree with the idea that this somehow characterizes the idea of quicksort better.

Technically, you can always convert a recursive algorithm to iterative and back again (and not just with existence arguments -- it can be generally constructed), so why pick on quicksort specifically?

The only thing achieved here is putting the partitioning of the list and sorting in two separable parts. Whether that characterizes quicksort any better is a matter of opinion, I reckon.

ultimape · on Dec 7, 2014

Is this a useful thing to implement as a parallel algorithm, or would I be better of just using mergesort as my choice?

tedunangst · on Dec 7, 2014

First question was why anybody would present this using Eiffel. That's apparently answered at the bottom, but I still don't understand. Eiffel is good at being abstract or something? But isn't the code presented a concrete implementation of a single way to do things?

jerluc · on Dec 7, 2014

I think it probably has a lot to do with the fact that the author, Bertrand Meyer, invented Eiffel (http://en.wikipedia.org/wiki/Bertrand_Meyer#Computer_languag...).

vram22 · on Dec 7, 2014

Eiffel is considered by some people to be one of the better programming languages (maybe even near the top of the heap, though of course a lot depends on application area you are using it for, just as with any other language). Features in it may have influenced other languages. Not sure, but I think Design by Contract is one of them - preconditions, postconditions and invariants, which can be implemented somewhat, using assertions, in languages that provide assertions. But Eiffel provides language support for them. Search for some Eiffel success stories. There was one very good one - I don't have the link right now - about someone at Hewlett Packard using it to develop software for printers, maybe a device driver, after earlier attempts using other languages turned out to have many problems - IIRC.

Edit: after a quick search, I found an article that is about the same story, of HP using Eiffel - though I don't think it is the same article I read a while ago:

https://archive.eiffel.com/eiffel/projects/hp/creel.html

From the article:

``Eiffel is the perfect embedded language...'': an interview of Christopher Creel, HP CCD (Color laserjet and Consumables Division) How HP used ISE Eiffel to develop leading-edge printer software, used Design by Contract to preserve the work of its best designers, and in the process found bugs in its legacy software, discovered a flaw in a hardware chip, and learned a few lessons -- such as how to do in weeks what used to require months.

I've used design by contract principles (in C) in a successful middleware software that I was the team leader for, ensured (pun intended :) that my team used it extensively in the code, and the end result met its goals and was somewhat widely used in projects by the company where it was developed.

Edit 2: Bertrand Meyer is also the author of a very well-known and respected book, Object Oriented Software Construction. It's a big book. I read a lot of it, some years ago. It has some very good points in it. Bertrand Meyer is one person who seems to have thought and done a great deal about the process of software development (with a view to getting better results), on many levels, from theory through practice to industrial application.

http://en.wikipedia.org/wiki/Bertrand_Meyer

http://en.wikipedia.org/wiki/Object-Oriented_Software_Constr...

coldcode · on Dec 8, 2014

Better is always subjective. Eiffel is not exactly in common usage in real programs. Lots of people think their language is better at something but until lots of people start using it, we don't really know.

vram22 · on Dec 8, 2014

>Better is always subjective. True, and I implied that in my first sentence. It is well known that Eiffel is not used by a lot of people. That does not automatically mean it is not good, though.

ape4 · on Dec 7, 2014

How does it compare speed-wise?

danbruc · on Dec 7, 2014

require j >= a.upper seems wrong.