Node.js memory benchmark confirms V8's GC may not be ready for the server

felixge · on Sept 29, 2010

To the author of this article: Could you run your node test with "node --trace-gc <script>" turned on? That will output when, and for how long, node's GC is doing it's thing.

Anyway, this could be a legit complaint at this point.

mraleph · on Sept 29, 2010

--trace-gc will show an endless sequence of mark-sweep/compacts.

I explained why that happens in my comment to the original post.

felixge · on Sept 29, 2010

Great comment! Anybody interested should check it out: http://hns.github.com/2010/09/29/benchmark2.html#comment-820...

olegp · on Sept 29, 2010

It explains why JSON parsing doesn't perform well, but doesn't explain the original benchmark results. Would be great to get a comment from @mraleph or the other V8 guys on that.

mraleph · on Sept 29, 2010

Speaking about GC behavior of other benchmarks:

1) one with Buffers is also causing mark-sweep/compact pauses (7-15ms each) because Buffer constructor calls AdjustAmountOfExternalAllocatedMemory which triggers full gc cycle if it thinks that too much memory is held outside V8.

2) GCs in string based benchmark are mainly scavenges taking 0 ms plus < 10 full collections taking <6ms each on my desktop.

That is all I can say. V8 GC is performing well here from my point of view.

hannesw · on Sept 29, 2010

I just found out the same. So I guess it must be some intra-VM data shifting? Anyway, I'll update my posting accordingly.

mraleph · on Sept 30, 2010

I don't know node.js well enough to even make a guess here. Somebody needs to profile it.

Updating your post sounds like a nice idea. It created a lot of confusion among developers.

hannesw · on Sept 30, 2010

Updated!

hannesw · on Sept 29, 2010

Yes, --trace-gc shows about 10 Mark-sweeps per second, each taking around 13 ms (no compacts though as far as I could see). But are those ~15% spent in GC are enough to explain the performance?

mraleph · on Sept 29, 2010

You were saying that V8 GC is failing here so I just explained why JSON.parse is especially bad for V8 GC.

Strictly speaking I am not even convinced that GC is bottleneck here. Only profiling can reveal the real bottleneck.

[I tried small experiment: used thirdparty pure-JS JSON parser instead of V8 JSON.parse --- that changed GC profile, but did not affect response time.]

amix · on Sept 29, 2010

V8 has a very impressive garbage collector (stop-the-world, generational, accurate) and the GC is probably a part the Google team have spent a lot of time tuning and working on as it's one of the harder and most important parts of building a VM...

My guess is that node's GC configuration isn't finely tuned for 25KB structures or maybe the GC is called prematurely.

Some suggestions: try to turn off the GC and re-do the benchmark, try with smaller JSON datastructures, try with different versions of node. Each of these would give more evidence where the problem is.

Btw. in that benchmark: which versions of RingoJS and node.js are used? How much memory does each server use in the end?

Edit: What type of garbage collector does RingoJS/Rhino use - how is the GC configured for RingoJS?

hannesw · on Sept 29, 2010

Most of these questions are answered in my original, longer post: http://hns.github.com/2010/09/21/benchmark.html.

The JSON I'm parsing is just objects with short string properties (around 10 characters). There's just one longer 25kb JSON string but that is never collected. As to Node configuration, can you provide some specific options to use? I've been asking about this on #node.js (and ryan) and I'm open to any suggestions.

Ringo is running with the server hotspot JVM without any further options.

xearl · on Sept 29, 2010

Per default, Java 6 uses a generational collector with multi-threaded stop-the-world copying for the young generation and single-threaded stop-the-world mark-sweep-compact for the tenured generation.

sh1mmer · on Sept 29, 2010

I think the editorializing of the headline is unnecessarily negative (read biased). There is obviously an issue Ryan is working on addressing but Node is clearly already providing performance which is suitable for many server workloads.

hannesw · on Sept 29, 2010

I'm the author of both the original article and this HN posting - and yes, I am biased, since I'm the main developer of RingoJS (the other platform in that benchmark). I've made that quite clear and provided additional background in the original benchmark to which this is just a short update: http://hns.github.com/2010/09/21/benchmark.html

I think my benchmark and the conclusions I draw from it (after a lot of thinking) are fair. My intention is just to make people see there's no magic bullet with performance or scalability, and that there are alternatives for server-side JavaScript.

sh1mmer · on Sept 29, 2010

I think your conclusions in the article are fair. I think the title on HN is misleading because it's a quantitive issue.

V8 GC is a well known concern in the Node community, but it's still performing well enough that Node is considerably faster than traditional servers (like Apache). The fact that Ringo is also faster doesn't make V8 "not ready" it just means it could be improved.

If I wanted to be contentious I could suggest that "Hacker News comments confirms that RingoJS may not be ready for developers" because the author likes taking pot shots at other frameworks. But that would be petty, wouldn't it?

hannesw · on Sept 29, 2010

You are right about the title. That "not ready for the server" is a foolish phrase. I'd change it to "not tuned for the server" if I could, but it looks like it's impossible to change that now.

IsaacSchlueter · on Sept 29, 2010

Maybe you could change it to "Not as optimized as Rhino to parse JSON under heavy load."

z92 · on Sept 29, 2010

> The fact that Ringo is also faster doesn't make V8 "not ready" it just means it could be improved

Well said. Most of us fail to realize this simple point until someone points out.

xearl · on Sept 29, 2010

I think the main point of this post (and also the edited headline) is that this is not an issue of Node per se, but rather of V8.

tlrobinson · on Sept 29, 2010

Yeah, but if you mention Node.js in the title of a Hacker News post you're all but guaranteed to get on the front page.

pthrasher · on Sept 29, 2010

Agreed, node is very fast.

js4all · on Sept 29, 2010

First, the response time variation is an important observation, thanks for that. To make sure, that it is caused by V8's CG, we need a GC log from the benchmark.

Second to make this a fair comparison, you need to use a similar sized heap. It could be, that the JVM heap was large enough to run the whole test without a major CG. We need a GC log also for this part of the test.

Sun was aware of the different CG strategies needed for server and client use and let us choose between them. V8 however seems the be optimized for the client side.

IsaacSchlueter · on Sept 29, 2010

I'm not yet convinced this is due to GC. Sent a pull request[1] to Hannes to use the faster Buffer technique, to at least rule out the interference of v8's crazy slow string juggling under load.

1: http://github.com/hns/ringo-node-benchmark/pull/1

robin_reala · on Sept 29, 2010

Has anyone done any experiements with node.js and Jägermonkey? They’re getting pretty close to V8 in speed ( http://arewefastyet.com/ ) and might prove better for server use (utter conjecture on my part).

midnightmonster · on Sept 29, 2010

My understanding is that the Xmonkeys are not nearly so amenable to embedding as node. There are some server implementations of older monkeys, though:

* whitebeam http://whitebeam.org/, which to my surprise appears to be not actually dead yet.

* jaxer http://jaxer.org/

* couchdb

But none of these seem to be low level enough to make a reasonable comparison.

Flusspferd http://flusspferd.org/ seems most like node, but using spidermonkey (circa Firefox 3.5, it seems, but actively maintained, so there's hope for future enhancements). Unfortunately, it doesn't yet seem to have all the handy web serving stuff that node has, so probably still not useful for a competitive benchmark.

sh1mmer · on Sept 29, 2010

I talked with Brendan Eich in New York after JSConf.eu and he was suggesting they might be looking at porting the V8 API to Jägermonkey after they release it so that it can run Node.

spahl · on Sept 29, 2010

From looking at the code it seems that node.js is very tied to V8.

It would be a lot of work to abstract the js core.

newman314 · on Sept 29, 2010

I wonder what effect this will have in a memory constrained environment on say webOS.

The original Palm Pre only has 256MB of RAM and had plenty of "out of mem issues" prior to homebrew adding compcache

russell_h · on Sept 29, 2010

Why does he pin this on the garbage collector?

hannesw · on Sept 29, 2010

Just an educated guess. If you're allocating tons of objects and strings and your app gets slow, it's very likely to be the GC. But I don't know V8 well enough to say for sure.

xearl · on Sept 29, 2010

When plotting the response time over time, this educated guess becomes even more solid:

http://earl.strain.at/2010/ringo-node-benchmark/buffer-alloc...

(This uses the very same dataset underlying the "buffer alloc" graph in the original post.)

jiaaro · on Sept 29, 2010

It seems like node performs better at the bottom of the curve where most of us are likely to be.

Can the performance issues be solved by adding hardware? (or spin up more node processes so that each one stays at the bottom of the curve?

xearl · on Sept 29, 2010

You're interpreting the graph incorrectly. There's no "bottom of the curve", the graph is showing a distribution, not some property over time. A correct reading will e.g. tell you that out of 50'000 requests a total of ~30'000 requests completed in 100ms or less. Or similarly for what you call the bottom of the curve: 5'000 out of 50'000 requests complete in ~25ms or less. This does _not_ state which 5'000 requests that were, and in particular it does _not_ imply that the _first_ 5'000 requests are faster. For this you'd have to plot response time over time.

jiaaro · on Sept 30, 2010

ahh... For some reason I thought it was showing request volume

maushu · on Sept 29, 2010

It is recommended to use multiple node processes on multi-core servers since node is single-threaded.

felixge · on Sept 29, 2010

That will help with the symptoms, but not with the problem. Having big response-time variations like this is undesirable.

maks · on Sept 30, 2010

Not only that, but one of Node's big selling points has been the supposed flat response time distribution it achieves with a simple 'helllo world' example. Clearly thats not the case once each Node/V8 need to do a real-world amount of work per request.

plq · on Sept 29, 2010

python did not return any memory whatsoever to the operating system before 2.5.

http://bugs.python.org/issue1123430

joevandyk · on Sept 29, 2010

"Confirms it may not be ready"? Really?

I've just confirmed I may have solved cancer.