pphaneuf's comments

pphaneuf · on Aug 5, 2010

Why would there be extra overhead when using edge triggered? There's definitely extra complexity on the client side, but it's close to what you're trying to do with super-poll (the extra complexity is basically to find out when an fd isn't busy anymore).

I think it might even be faster, kernel-side. From what I remember of the implementation, both modes have to walk the same list of ready fds, but that list is shorter in edge triggered mode, because they get removed from the list as it goes.

Edge triggered might have more overhead if many fds change between ready/not-ready quickly, but that's quite the wacky situation (and if it has an even distribution, would ensure your ATR is about 0.5, so probably still winning).

pphaneuf · on Aug 5, 2010

Your results seem correct to me, even though I'm a bit disappointed at the actual number (I was hoping for 0.8 or 0.9, not 0.6), and the code looks good. I'm meaning to run it for myself, but didn't have the time to do so, I should get on it shortly.

I still of the opinion that a >0.6 ATR means you're fucked anyway (where processing all those fds can't be done as fast as data arrives), but I'm trying to come up with a good small test for that hypothesis (like you did).

Until I do, it's just an opinion/hypothesis, and it's kind of hard to argue about it strongly either way.

pphaneuf · on Aug 5, 2010

Right on. Myself, I always figured epoll was a bit slower than poll when most/all of the fds are ready all the time, from reading the code.

Now, there's an experiment to figure out what the threshold is, and frankly, it's a bit lower than I had hoped (I would have guessed 0.9, 0.8 at worst), which is exactly why having the actual data is awesome.

pphaneuf · on Aug 5, 2010

I've never perceived epoll as being O(1), always O(N), the big thing being the N being how many events are dispatched, rather than the number of watched fds.

I'm fine with the amount of time taken scaling linearly with the amount of work to do, I'm not cool with it scaling linearly with the number of flowers in the garden. :-)

pphaneuf · on Aug 5, 2010

Hey, some idea I remembered (from some old paper): don't go from epoll to poll when the ATR gets too high, go to... NOTHING!

Just read() or write(), as if something told you they were ready! Your code has to be able to handle EAGAIN anyway, right? If you're at 1.0 ATR, you're saving the whole cost of poll or epoll!

I'm actually serious here, BTW. It would need some measurement to figure out the threshold at which this is okay, and if it's close enough to 0.6, I'm guessing you could go straight from epoll to nothing, and take the small (potential) performance hit between 0.6 and 0.X.

Hmm, I'm an idiot: this is basically epoll in edge-triggered mode.

pphaneuf · on Aug 4, 2010

Adding/removing from the poll() array is more or less free, but with epoll, you have to use epoll_ctl(), so you'd definitely want an hysteresis to avoid bouncing things around too much.

zedshaw · on Aug 4, 2010

Interesting, you're like the 3rd or 4th person who's mentioned hysteresis.

rbanffy · on Aug 4, 2010

It's a fairly common idea.

pphaneuf · on Aug 5, 2010

As in "only the 3rd or 4th" (so I'm possibly clever), or as in "a bunch of people"? I don't know your sample size. :-)

pphaneuf · on Aug 4, 2010

Question: as the ATR is going higher, so would the proportional time spent in poll or epoll, no?

So if you have a thousand fds, and they're all active, you have to deal with a thousand fds, which would make the difference between poll and epoll insignificant (only twice as fast, not even an order of magnitude!)?

This would make the micro-benchmark quite micro! Annoyingly enough, I think that means that the real way to find out would be an httpperf run, with each backends. A lot more work...

pphaneuf · on Aug 4, 2010

The best would be if it would be possible to code up superpoll to be adaptive, and in effect, benchmark itself to come to the same conclusion, dynamically. So if one day the kernel people fixed epoll to be better all the time, Mongrel2 would magically not use poll() much on systems using that kernel, and favour epoll.

Of course, that's often Kinda Hard To Do (tm). ;-)

pphaneuf · on Aug 4, 2010

I think what he's saying is that it should be pretty easy to change between poll, epoll, and "superpoll", as you can easily abstract a common interface. Then, you could worry about the performance of that particular bit only when you encounter it, and use your time arguably more effectively on actual feature needed to productionize Mongrel2 better.

I sort of agree with him, except with an important detail: this question is basically about prioritization of your time, and I'd say that this is nobody's business but yours! You can optimize memcpy() all day, for all I care. ;-)

There's one aspect where you'd be quite right to do some investigation before implementing: if the outcome changes the interface you'd need to implement.

For example, here's an hypothesis: using epoll's edge-triggered mode could drastically reduce the number of events (since you only get an event when an fd becomes readable/writable the first time, instead of every time it's in that state). Since epoll is O(N) on the number of events returned (not on the number of fds that are currently readable/writable), you'd lower the effective ATR a whole lot. In fact, a really busy server would have fewer events, since a readable fd would stay that way for longer if data is received at a great rate (the write-side story might be less brilliant). You'd also have to do much fewer calls to epoll_ctl, since you could just stop caring about the reading side while you're trying to write the last batch of data on the other side (no need to remove it from the interest set, you won't get events for it). You only need to set it when flipping from read to write, and the other way (after receiving/sending headers and bodies).

Now, if that's true, that's a big deal, because now you have to change your design a fair bit. You have to remember that an fd is readable until you get EAGAIN from read() yourself, so there's some more state management, moving that fd from one list to another, etc. Finding out that this would be a million times faster (or slower!) now would save you a ton of work, either way.

But finding whether poll or epoll is faster, or an hybrid solution with the same interface? Meh, it could wait.

(about my hypothesis, that's actually what Davide Libenzi designed epoll for, which might explain some of the weirder bits)