Vert.x vs node.js simple HTTP benchmarks

For a bit of fun, I decided to do a little bit of micro-benchmarking of Vert.x vs node.js HTTP performance.

Firstly, a disclaimer: This isn’t rigorous benchmarking, and I haven’t attempting to benchmark a wide range of use cases (I just test some HTTP stuff here). All the benchmarking is done on a single machine (my desktop). This is not ideal – a good benchmark would have clients and servers on different physical machines and a real network between them. So don’t read too much into these results. (You can read a little bit, just not too much ;) ) In the future, when I can get hold of some real hardware I intend to do some real benchmarking. Until then this will have to do.

Apparatus: All tests were run on my desktop: An AMD Phenom II X6 (that’s a six core, not as good as the latest Intels but pretty good), 8GB RAM (although only a fraction was used in the tests), Ubuntu 11.04.

Versions: vert.x-1.0.final, node.js 0.6.6

The tests:

Test 1 tests the performance of a trivially simple HTTP server which returns a 200-OK for every request.

Test 2 tests a slightly less trivial HTTP server which serves a real static file from the file system. The file is 72 bytes in size and just contains this:

<html>
  <head>
    <title>Some page</title>
  </head>
  <body>
    <h1>Foo!</h1>
  </body>
</html>

The client used was written in Java using Vert.x, and fires requests using 10 connections, and a maximum of 2000 pipelined requests at any one time per connection. I used 6 instances of the client so that means 60 connections in total.

Every 2000 responses received it sends a message on the Vert.x event bus which is picked up by another component which computes the overall and average rates and prints them to the console every three seconds.

Each test was run for 1 minute, and the average (mean) rate taken.

The code is here if you want to see it or try it out yourself.

For comparison I ran each tests against three server setups:

  1. A single node.js server
  2. A single Vert.x server process with 6 event loops.
  3. A node.js server with six child processes forked using the `cluster` module to distribute work out to them

One of the advantages of Vert.x over node.js is a single server instance can utilise all the cores on your server without having to manually fork child processes and without you having to write load-balancing code to farm out requests to the children.

For the Vert.x servers I tested a version of the server for each of the languages we support: currently JavaScript, Ruby, Groovy and Java.

Results:

In this test, the server simply returns 200-OK to each request. I tested Vert.x Java, Ruby, Groovy and JavaScript single server processes running with 6 event loops, against a node.js single server process. Since a node.js server only has 1 event loop, I also show the results for 6 node.js server processes using the cluster module.

As you can see a single Vert.x process hugely outperforms a single node.js process, and also greatly outperforms a cluster of node.js instances.

In this test, it serves a small static file. I tested Vert.x Java, Ruby, Groovy and JavaScript single server processes running with 6 event loops, against a node.js single server process. Since a node.js server only has one event loop, I also show the results for 6 node.js server processes using the cluster module.

For the node results I show the results both for using the `readFile` function to load the file, and also for using streams. For streams – I show three variations – one using a blocking call to stat the file to get the size, another using a non blocking call to stat the file, and a third using chunked transfer encoding (so it needs no stat).

Again, a single Vert.x process hugely outperforms a single node.js process, and also greatly outperforms a cluster of node.js instances.

Summary:

I don’t think I need to say much here. The numbers speak for themselves.

[Edit. I initially wasn't going to provide any summary, but several observers have said I should summarise the results. So here's my summary:]

1. Vert.x is much faster than Node.js when reading the same file from the filesystem more than once. Neither Vert.x nor the JVM is doing any explicit caching, so this is most likely due to the OS caching it (memory mapped files?). In a webserver-type app the same file is likely to be served many times so this is pretty significant
2. Node.js doesn’t appear to scale particularly well using the `cluster` module. The advice from the node.js folks is not to use it (for now). This leaves node with no out-of-the-box way to scale across multiple cores :(
3. Many of the out of the box load testing tools don’t appear to pipeline very much. The Vert.x results are dramatically better when there’s a high degree of pipelining going on.
4. Java on the JVM is extremely fast. No surprises there ;) But what is very encouraging is how the other JVM langs held up – in particular Rhino held up very well against V8 :)

[Second Edit: As I said in my replies, I should never posted the results with Vert.x "crippled". No-one is going to do this in real life. They will just be running Vert.x or Node.js on the same hardware in whatever way they can get it to exploit the available cores. This really makes the "crippled" Vert.x results redundant. So I've removed them. ]

About these ads
This entry was posted in Uncategorized. Bookmark the permalink.

151 Responses to Vert.x vs node.js simple HTTP benchmarks

  1. essdee says:

    Any possibility of a memory usage graph(s)

  2. Adam Fisk says:

    Love it. The sketchy node performance claims have been bugging me forever and the associated lack of benchmarking appalling really. Thanks!

    • Check the “benchmark” folder in the Node.js source. We run benchmarks very regularly, mostly using ab.

      • Adam Fisk says:

        Ah apologies — I haven’t dug that deep, but great to hear! Google searching on the topic has turned up shockingly little in the past, although it’s been a few months since I last gave it a shot.

  3. Nice benchmarks Tim. I’m currently using node.js for front-end HTTP event collection but I’ve had my eye on using vert.x for the last week or so since finding out about it. Ideally I’d like to use vert.x with Scala, so…when will the Scala support be ready? :-)

  4. Tim Wang says:

    I wish Python is supported…

    • Tim Fox says:

      Python support is on the TODO list. Again, it’s a matter of resources and how we prioritise it against other tasks. We have Small team syndrome ;)

    • Salim Madjd says:

      I second the python support.

      • Tim Fox says:

        We are looking for people with Python experience to implement the Python API (this is a shim over the Java API, in Jython).
        If you are interested in volunteering, come see our task list and join us on the google group.

  5. gandalfu says:

    As a baseline, i would like to see how it compares to apache serving the static hello page.

  6. ranjan says:

    it’s actually memory consumption that i’m interested in …

  7. Why are you using the deprecated `”binary”` encoding argument? That’s causing a lot of extra copying in the Node.js server. Also, it’d be interesting to see how this performs on 0.6.17, since 0.6.6 is a few months behind at this point.

    It’s a bit frustrating that the benchmarks use your own vert.x client rather than a well-established benchmarking tool such as ApacheBench or Siege. That makes it harder to know exactly what these numbers mean, and harder to trust that they’re objective. In looking at the time that the Node.js server is spending, the overwhelming majority of it is in reading the file, at least on my machine, and is not very much slower than a simple C program that reads the file repeatedly. A “how fast can you read this file” test would be interesting.

    In general, if a blog post starts out with a disclaimer saying “This isn’t a rigorous benchmark,” then you probably ought to stop there, and go make it rigorous. You should not publish results that you don’t want people to read into. Anyone who prefers the JVM to Node.js will say, “Aha! This proves it!” and anyone who prefers Node.js to the JVM will be tempted to read it as a cheap attempt to garner attention using bogus benchmark shenanigans.

    It’s a bit like if I’d started this comment by saying, “Disclaimer: This isn’t a rigorous comment. I haven’t read the whole article.” Anyone who already agrees with me will keep agreeing, and anyone who doesn’t will never take me seriously, so I would’ve lost 100% of my effectiveness, and only stirred up a bit of controversy. You say “the numbers speak for themselves”, but you haven’t even showed the numbers, or any profiling information, or even any theory at all about where the 10x discrepancy is coming from, only the colored bars. As it is, I am extremely skeptical, and frustrated because it would be very exciting to find a 10x latency bubble in node. (I’ve basically given up hope that such massive improvements still exist!)

    If your thesis was “The JVM is faster than V8″, well, that’d be somewhat interesting, if the Node.js server was actually spending any considerable time in JavaScript. But almost all of its time is spent reading the file from disk. How can Vert.x make a hard disk 10x faster? (Perhaps it’s not actually hitting the disk on many of those requests?)

    Do the rigorous benchmark. You don’t need more badass hardware. You just need to use a testing tool that is not written by you, and share the numbers rather than the colored bars. Profiling where the time is going would also add a lot of weight to your claims.

    • validatorian says:

      This.

    • broofa says:

      Like.

    • Tim Fox says:

      Thanks Isaac,

      I’ve rerun the tests using both readFile with the binary encoding dropped, and also using streams. This does increase the node results a little, but they are still way behind the Vert.x results.

      See the graphs for the results.

      Regarding the client – the code is in github and very straightforward, feel free to view the code or run it yourself. Nothing to hide there.

      About disk – the disk won’t be accessed much, the operating system will keep the file cached in its own cache after the first read. Subsequent reads from Vert.x or node will be reading from the cache – so the disk speed isn’t really relevant.

      If you want the raw numbers I will share the link to the google doc.

      And if you are in doubt, the code is in github you can run it yourself.

      • Zman says:

        Isaac can read the graphs just fine, the problem is that you are using a testing tool you wrote yourself instead of a standard already out there, and are not logging enough info about what is going on in the process (ie. memory consumption, i/o times, cpu usage etc.)

      • In your benchmarks nodejs is reading off the disk for every request. That’s the reason for the dramatic performance difference. Monitoring disk ops you can see this as well. Seems the JVM is doing some caching that v8 isn’t for reading the same file.

        Also making a small change so that both Vert.x and node.js only read the file once using this:

        Then node.js ends up performing substantially better than Vert.x. I changed both Vert.x and node.js apps to match so it would be apples to apples comparison.

        The results I got had node.js outperforming Vert.x by 50%.

        Anyway this goes to show how misleading micro benchmarking can be.

        [Edit by Tim Fox - please see my reply to similar message posted elsewhere on this thread for an answer to this]

    • jonanin says:

      This so many times. These benchmarks mean nothing and your process is flawed. Try again.

    • bhauertso says:

      I can’t speak to Siege, but I will just put this out there: I suspect one reason Tim created his own client is that ApacheBench is a single-threaded tool. Until it is updated to use more than a single CPU core, it cannot adequately stress-test a high-performance server.

      The only way to use ApacheBench to test a high-performance server is to launch multiple ApacheBench instances simultaneously and painstakingly correlate the results. That process is painful and error-prone.

      That said, colleagues of mine have independently benchmarked several HTTP frameworks (including vert.x and node.js) using ApacheBench, despite the limitations we face in doing so. While across the board our numbers are lower than what Tim has printed here, they are consistent with the overall flavor: vert.x can process more requests per second than node.js. Considerably more.

      • John Kew says:

        Yeah, I agree. AB is great for simple one-offs, but I never use it for serious testing. Actually I’ll pretty much only use it to warm up a server. httpperf from hp is considerably better, but I’m not completely surprised that he wrote his own.

    • Just wanted to confirm your guess that the Vert.x version isn’t hitting the disk, while the node.js server he is testing against IS hitting the disk for each request. When changing it so that the disk isn’t hit each time, node.js outperforms Vert.x

      • Tim Fox says:

        It’s the OS doing the caching not Vert.x/JVM.

        If node.js chooses to bypass OS cache, if anything, that’s a shortcoming in node.js, not vert.x!

        I tested using your code, using 1 node.js core, 6 node.js cores (using cluster), 1 vert.x core, and 6 vert.x cores and get quite different results:

        1 node.js core: 27693k
        6 node.js cores (cluster): 57596
        1 vert.x core: 49366
        6 vert.x cores: 174892

        See https://gist.github.com/2653345

        The vert.x results will be even higher if you use the java server rather than the js one.

      • Tim Fox says:

        All you’ve really done is given us evidence that the node.js file system implementation isn’t too great.

      • Try replicating those results on the same hardware. m1.small instance on EC2. It’s a single core machine which prevents any under the hood uses of other cores. I suspect you will get the same results. I used a fresh instance with nothing on it but nodejs and vert.x.

        Seems that you are either overlooking something or being disingenuous.

        In any case, if this makes anything clear, is that benchmarks need to test more than one simple small task.

      • @TimFox Lets go with that for a second and concede the point for the sake of discussion.. nodejs has a not so great filesystem implementation.. if that’s the case then your benchmark simply exploits that weakness for the sake of getting impressive benchmark results.

        Again this is the problem with microbenchmarks. You are only testing one specific thing, opening a file, reading it. I suspect people are not going to be doing much static file serving with either of these pieces of software. There’s nginx for that.

      • Tim Fox says:

        Benchmarking on EC2 is a bad idea. It’s a virtualised environment and you don’t have control over who is using the underlying CPU or network bandwidth.

        Secondly… if someone cares about perf they’re not going to be running their app on a small one core EC2 instance. That’s about as slow as a 386 from 1994. ;)

      • Tim Fox says:

        Disingenuous implies I knew about the poor filesystem support in node.js before I started, I can assure you that’s not the case.

        You say the node.js isn’t really used in real-life for serving files – there may be some truth in that. However it’s not true to say the node.js fs support isn’t used, in general, in node.js apps, so a benchmark that includes node fs support isn’t without value.

        I do however agree with your general statement about microbenchmarks, and that is why I put the bit disclaimer on the top of the article in the first place. :)

      • I used that instance type because it only has one core and your results are not consistent what what they should be if it’s using one core. When I run them on a multicore server (even when configuring it to use 1 process) the results don’t add up. Running the benchmark on a single core server prevents any shenanigans.

        The results running back to back produce consistent results for me. Unless coincidentally ec2 is adjusting my available resources to precisely the same level exactly when I’m running each benchmark. ;)

      • Tim Fox says:

        Please see my last reply to Isaac for why limiting to a single core is silly.

      • Tim,

        “Disingenuous implies I knew about the poor filesystem support in node.js before I started, I can assure you that’s not the case.”

        Now, now, no reason to get snippy.

        Node calls read(2) when you call fs.read(). That’s not “poor performance”. That’s “not being magical.” Vert.x does not actually read the file when you ask it to. This is sacrificing correctness for the sake of speed.

        That’s sometimes a valid choice, of course, but it’s not the one Node.js makes, and it makes the benchmark numbers wildly invalid.

        As I suggested in my previous comment, this should really just be a benchmark of “how many times can you read the file per second”, and Vert.x wins by not actually reading the file. (As in, not copying the file contents each time into a new location in user space.)

        Courtney’s findings indicate that even with `-instances 1`, it’s still using more than one CPU. It can also be limited to a single CPU using the pbind command, and the results are consistent with Courtney’s EC2 tests. The CPU time per second on the Vert.x `-instances 1` test shows about 1700ms of CPU time per second, indicating that it cannot possibly be using only one CPU.

        What would convince you that this benchmark is invalid?

      • Tim Fox says:

        Your point about “correctness” makes little sense. The correct file is delivered to the client, that’s what matters. Whether node chooses to bypass OS cache and read it from the disk each time does not make it any more correct, just slower.

        And why the obsession with limiting to just one core? As I said in other replies, no-one who cares about performance is going to have just one core to play with, and results show Vert.x scales better over cores.

        The point is the user has X (>> 1) cores and they have two pieces of software installed – Vert.x and Node.js – the question is which one can perform better given that hardware.

        Even if you could show that Node.js performs better (when fs is disabled) using just one strict core, all you’ve done is shown that it performs better on a machine that no-one uses (and if they are cheap enough to use one they certainly don’t care about perf).

      • Google suggests that java nio does mmap’d reads. If node is doing read(2), then this could explain some of the differences.

      • You say: “For this reason, I have added the `crippled` Vert.x server which is told specifically to only use one core – this makes it easier to compare to a single node.js instance.

        This is the same as the previous test, but this time I ‘crippled’ the Vert.x servers so they only use one core, the same as node.js. You probably wouldn’t do this in real life, but it makes for a good comparison.”

        then say: “Even if you could show that Node.js performs better (when fs is disabled) using just one strict core, all you’ve done is shown that it performs better on a machine that no-one uses”

        So does it make for a good comparison to see how Vert.X performs being limited to a single core or not? If so, then really make sure Vert.X is only using 1 core. If not, simply state that you are not going to compare Vert.X using a single core to node.js because your results above are simply not consistent with what you would get when Vert.X truly is limited to a single core.

        If you want to compare a what using many cores and using a single core will result in, sure you’ve demonstrated what everyone knows. Efficient use of a multicore system will beat a single core process on a multicore system. I’d be curious what using haproxy to balance between N-node processes would perform like compared to Vert.X. I suspect the performance gap in my results would pretty much match the results I got so far. This is not to say that Vert.x isn’t neat, but it seems even when corrected you simply get defensive about your results.

        In any case, this benchmark simply exploits the differences between node and vert.x. node isn’t being used how it would in real world and thus you come up with results that are all but completely irrelevant. If you were being honest here you would say: “poorly written node code will run much slower than correctly written vert.x code”. But that’s pretty useless.

        Judging from your responses when people have pointed out that your statements like this: “the disk won’t be accessed much, the operating system will keep the file cached in its own cache after the first read. Subsequent reads from Vert.x or node will be reading from the cache” were wrong. and respond:

        “The correct file is delivered to the client, that’s what matters. Whether node chooses to bypass OS cache and read it from the disk each time does not make it any more correct, just slower.”

        to me says that you are less interested in honest benchmarks and more interested in being right.

      • Tim Fox says:

        As I’ve already said in another reply, I wish I’d never published the Vert.x results with a single core.

        Regarding “the disk won’t be accessed much, the operating system will keep the file cached in its own cache after the first read. Subsequent reads from Vert.x or node will be reading from the cache”

        That statement is _not_ wrong. That’s exactly what is happening. If node.js decides to use a different api to access the file and not benefit from OS caching, that’s up to node.js.

      • By the way… for giggles I tested vert.x vs nodejs on:

        7 GB of memory
        20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
        1690 GB of instance storage
        64-bit platform
        I/O Performance: High

        Used 8 instance vert.x running against nodejs cluster. I changed the fs.read as well so it doesn’t do disk access each time. (using the same change as I posted before).

        Results:

        concurrency 1x: vert.x 11.3% faster
        concurrency 10x: nodejs 27.2% faster
        concurrency 100x: nodejs 16.9% faster
        concurrency 1000x: vert.x CRASHES

        So it seems even in multi-core environments node outperforms when doing the same work as vert.x unless you are only accessing serially.

        Full results available here: https://gist.github.com/2657432

      • Tim Fox says:

        Why are you disabling disk access?

        I’ve said (a few times now) – Neither Vert.x nor the JVM caches the disk, it is acting correctly. Therefore there is no reason to disable disk access.

        So let we get this right. You’re disabling the part in node.js that is particularly slow, then claiming that Node.js is now faster than before. Well… no shit!

        Regarding your results. It’s almost impossible for me to tell what you’re actually doing here – but judging from the poor rates it looks like the level of pipelining is very low. I’m not sure most common http tools are very good at loading up pipelining.

        If you used the client I provided I suspect the results would be quite different.

        Vert.x crashing? I can’t tell without seeing your setup, but I’d guess it requires more RAM. With a JVM process it doesn’t just expand to fill all available RAM, you have to tell it via server settings how much to use (max/min) at the outset. This is normal practice.

      • FYI – I’m disabling disk access or both vert.x and node.js. So if really that’s the only way you can get vert.x to surpass node.js, then let’s call it what it is.. your benchmark exploits one particular difference in behavior between vert.x and node.js. How about a fibonacci benchmark? Or something non disk io related? I got similar results there where nodejs outperformed vert.x.

        So on flat disk io yes you likely won’t call fs.read on node every time you need a file. So are you saying ALL of node.js is slowe because you don’t like fs.read? Seems it scales across cores well and if you take out the fs.read I get pretty wildly different numbers.

        Also, your complaint is that I’m using a 3rd party, widely accepted load testing tool rather than the one you specifically designed? no thanks, I’ll trust a standardized tool over something that’s part of your project to test performance!

        Since I’m using the same (standard) tool to test both node.js and vert.x any complaint about pipelining is moot since both are getting the same requests.

        Let’s call it what it is. fs.read is not the same as the read mechanism you use in vert.x We all know that. Aside from that, where else do you see performance problems in node? Seems that’s the only thing left as when that’s out of the picture and we use standardized load testing tools node outperforms.

        In any case I’m turning off the email notifications so I likely wont see any further replies. I said my peace.

        On a side note vert.x does look like it has some promise and I’ll keep an eye on it. My complaint here isn’t about the project, but the methodology you are using and your attitude towards critiques of your load testing methodology (you seem disinterested in making changes to provide more useful results).

      • Tim Fox says:

        Courtney,

        Thanks for your opinion, but I respectfully disagree with it.

        Disabling the slow bits in node.js then claiming node.js is now faster, is somewhat of a Pyrrhic victory, don’t you think? ;)

        The benchmarks with the original client specifically uses a high degree of pipelining and you have disabled that too, so really it’s not the same benchmark any more.

        I’ve updated the original article with a summary at the end, as I see it.

        Thanks for your participation :)

  8. mark says:

    In my benchmarks, I have rarely seen any JVM simple app outperform node.js by these margins if at all except CPU intensive algorithms . I’ve tested with siege and apache bench. I SERIOUSLY DOUBT the results.

    • Tim Fox says:

      All the code is in github, feel free to run them yourself if you have doubts.

      • mark says:

        I still DOUBT your results. I did a dumb version of the node version and it’s still roughly the same speed. I don’t know how you’re testing but it’s wrong.

        siege -c30 -b -r500 http://localhost:3000/

        // node.js
        var express = require(‘express’)
        , app = express.createServer()
        , fs = require(‘fs’);
        app.get(‘/’, function(req, res){
        fs.readFile(“index.html”, function(err, text) {
        res.end(text);
        });
        });
        app.listen(3000);

        Transactions: 15000 hits
        Availability: 100.00 %
        Elapsed time: 7.15 secs
        Data transferred: 2.39 MB
        Response time: 0.01 secs
        Transaction rate: 2097.90 trans/sec
        Throughput: 0.33 MB/sec
        Concurrency: 29.83
        Successful transactions: 15000
        Failed transactions: 0
        Longest transaction: 0.05
        Shortest transaction: 0.00

        // vert.x
        load(‘vertx.js’)
        vertx.createHttpServer().requestHandler(function(req) {
        var filename = “index.html”;
        req.response.sendFile(filename);
        }).listen(8080)

        Transactions: 15000 hits
        Availability: 100.00 %
        Elapsed time: 7.36 secs
        Data transferred: 2.39 MB
        Response time: 0.01 secs
        Transaction rate: 2038.04 trans/sec
        Throughput: 0.32 MB/sec
        Concurrency: 29.92
        Successful transactions: 15000
        Failed transactions: 0
        Longest transaction: 0.10
        Shortest transaction: 0.00

      • Tim Fox says:

        [Replying to Mark (above) since WordPress doesn't seem to allow replies more than 3 deep]

        Regarding your results: The load you are proving is so low (0.33MB/s) that both node and vert.x can both easily handle it – hence the same results.

        You need to use many connections and lots of pipelining to really stress vert.x

  9. irishado says:

    @isaac I think it was meant as a post to start a conversation, I too would like more detail on system also if it had been my post i would have asked for help making each test run the moost efficent to really test.properly. I took from the article that the dev had found something interesting and was just sharing it and wanting confirmation that he was either right or wrong rather than it being a this is better than that post

    • spasquali says:

      Isaac’s point was simple: what is the use of an article which begins with a large disclaimer that states, in effect: what follows offer no hypothesis, no rigorous methodology, and no conclusion the author stands behind… but it is full of insinuations!

      What sort of conversation can this start?

  10. jbilcke says:

    Nice!
    It is a promising plateform for JS code!
    -I would love to see a comparison with the Play! Framework (if possible on real-world examples: not hello world, but routing, templating, forms.. maybe on 100/500/1000/5000 clients?)

  11. thebentarrow says:

    That’s awesome stuff! It will open up all sorts of interesting applications. +1 for java

  12. Calle says:

    I am Interested in the CPU and memory consumption during the runs.
    Do you have those?

  13. tomkit says:

    A purported benchmark with a huge disclaimer at the beginning of the post, and posted from a vert.x blog… +1 Isaac

  14. NodeJs News says:

    So what ? We could always find a faster language … The goal is to have opportunities to use them ? Node.js is fast Vertx too : GOOD, it’s a challenge :)

    • Tim Fox says:

      Agreed. Node.js is a great framework, and the truth is for most people perf is good enough.

      Like I said at the top of the article this is a bit of fun :)

  15. Jonas says:

    > 200k req/s? What could you possibly be benchmarking that would give that result? Probably not simultaneous requests. Are you just firing them in sequence? Then this says absolutely nothing. You could just as well have no concurrency at all with these numbers.

    • Tim Fox says:

      This is explained in the article. The figure quoted is total number of req/s over 60 connections, each of which pipelines at most 2000 reqs per connection.

      [Adding comment here since I can't seem to reply to the reply to this]

      Yes, to really stress the system you need either many connections or to pipeline a lot

      • bhauertso says:

        Tim, thanks for repeating this here. We’ve done independent tests using ApacheBench and–as I just posted in a separate reply–although our results match the flavor of these results, the numbers are much lower across the board.

        Two likely reasons:

        1. ApacheBench uses a single CPU core to generate load. You can only generate sufficient load to stress a high-performance Servlet container (e.g., Resin 4.0.27) or Vert.x using multiple concurrent instances of ApacheBench. So far, we’ve limited our tests to a single instance of ApacheBench.

        2. We configured ApacheBench to create a new socket connection per request. Your use of pipelining most likely explains the majority of the difference we see in our numbers versus yours. I suspect setting up a socket (even a loopback socket) is more bulky than all of the code used to receive a request and send a response.

    • Jim2 says:

      Yes, I believe pipelining is the difference where it makes. It fires requests in sequence. And node.js currently not support pipelining.

  16. @Tim Fox: You are performing a blocking call in the streaming node server:

    https://github.com/purplefox/vert.x/blob/master/src/examples/java/httpperf/nodejs-server.js#L24

    This definitely affects the performance of the streaming version (which does not have its own file, but is instead run by uncommenting that and running it).

    You should break it out into different files so that the benchmarks are all reproducible without modifying the code.

    Thank you,
    D

    • taf2 says:

      var fs = require(‘fs’);

      require(‘http’).createServer(function (req, res) {
      res.writeHead(200, {“Content-Type”: “text/html”, “Transfer-Encoding”: “chunked”});
      fs.createReadStream(“foo.html”).pipe(res);
      }).listen(8080, ‘127.0.0.1’);

      // might be better?

      • Tim Fox says:

        I also updated results for chunked transfer encoding – a bit faster (using cluster) but still no cigar.
        I’ve updated the graph and the spreadsheet with the new results.
        Please see above.

    • Tim Fox says:

      I updated the example to use a non blocking stat. It’s actually slower than the blocking stat!.
      I’ve updated the graph and the spreadsheet with the new results.
      Please see above.

      • The point is that I suspect you’re doing an fs operation *at all* for every request in node, and Vert.x is not. The chunked streaming response will not make up for the fact that Node.js is doing more work than Vert.x.

        It is expected that the nonblocking stat will make the response slower moreso than a blocking one. Async isn’t some magic goo you pour over your program to make it faster. The difference is that it’ll make it more responsive while the disk IO is in process, rather than slowing down other requests. This is a major win in real life, but not in silly benchmarks.

        Yes, I realize that the file will be cached by the OS. However, the bytes still have to be copied into user space. That is not free, and it is where the node server is spending the vast majority of its time. When you cache the file contents up front, and send the same buffer with each response, it gets much faster.

        As I explained here and on Hacker News, I predict that the only way that you could be getting such high results is if Vert.x is not actually running single-threaded when you claim it is crippled, and/or b) it’s not actually reading the file contents each time. Otherwise there just isn’t enough time in the Node.js program for you to be saving this much.

        The Node.js load balancing mechanism is known to have issues when under extremely high load. (Ironic, right? All is fast for small n!) This will be fixed soon. But, that’s why the Node.js cluster server is sometimes showing up as only about double the single server, when it ought to be roughly a linear multiplier of max(cluster size, cpu count). Of course, if you’re using ab on localhost, then you can’t generate enough load to make a difference.

        I would be delighted to be disproven. It would be great if 50-60% of the work that a Node.js server does turns out to be extraneous, then we can make Node.js much much faster. Nothing would make me happier. However, we’re not about to start auto-caching file reads in user space, or running servers in multiple threads silently, or any other non-explicit magic.

      • Tim Fox says:

        [Reply to Isaac]

        (Incidentally – Vert.x does actually include a sendFile operation which tells the kernel to copy the file from disk to socket without copying through userspace, we don’t actually use it in the benchmark since it’s not so efficient when the files are small)

        All this talk of limiting stuff to one core is rather silly. As I’ve said in another reply no-one who cares about performance is going to have just one core to play with.

        The reality is they’ll have 8, 16, 32 or more cores, and all the results so far have shown that Vert.x performs better than Node when multiple cores are involved.

        If that’s down to a poor cluster implementation, and that gets fixed soon, great :)

        Regarding caching files in userspace – there’s no need for node to do that, vert.x doesn’t, and the JVM doesn’t, it’s the OS that does. I can only guess that Node is using an OS file API that bypasses the cache for some reason.

  17. taf2 says:

    sure this is fast… but does it work?

    /Users/taf2/work/vert.x/target/dist-build/vert.x-1.0.final/bin/vertx run org.vertx.java.examples.perf.RateCounter -cp classes -cluster -cluster-port 25501
    Starting clustering…No cluster-host specified so using address 10.24.138.79
    Started
    May 09, 2012 6:45:17 PM org.vertx.java.core.logging.impl.JULLogDelegate error
    SEVERE: Failed to create verticle
    java.lang.ClassNotFoundException: org.vertx.java.examples.perf.RateCounter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at org.vertx.java.deploy.impl.ParentLastURLClassLoader.loadClass(ParentLastURLClassLoader.java:56)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at org.vertx.java.deploy.impl.java.JavaVerticleFactory.createVerticle(JavaVerticleFactory.java:61)
    at org.vertx.java.deploy.impl.VerticleManager$2.run(VerticleManager.java:271)
    at org.vertx.java.core.impl.DefaultVertx$2.run(DefaultVertx.java:251)
    at org.vertx.java.core.impl.Context$2.run(Context.java:113)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processEventQueue(AbstractNioWorker.java:360)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:244)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

  18. G’Day Tim,

    How did you cripple the Vert.x server? Was it something like “-instances 1″? Thanks.

  19. Pingback: Vert.x vs node.js simple HTTP benchmarks « async I/O News

  20. Paul says:

    I am Interested in the CPU and memory consumption during the runs.
    Do you have those?

  21. Pingback: Vert.x: Node.js-Alternative auf JVM-Basis | virtualfiles.net

  22. madrense says:

    I kind of gave up on posting benchmarks on the web .. posted a comparison between node.js, a simple Netty server and a simple Go server back in 2010 and Netty beat node.js.

    Still I find vert.x very interesting so for reference I did a quick run of your benchmarks and bottom line is I got ~ 60K/s using 4 vert.x instances and ~ 8K/s using 4 node.js processes.

    I also ran vert.x directly using java (got the command line by looking at ps aux | grep java) in order to pass some JVM flags. Since I’m using a 64 bit OS I believe -server is on by default so I just experimented with -Xmx128m to reduce memory usage.

    Here is my run of your benchmark:

    Intel i7 Q820 – 4 cores + HT
    Ubuntu 11.04 x64

    Java 1.7.0_01-b08
    node v0.6.17
    vert.x 1.0.final

    ab -k -n 100000 -c 100 http://localhost:8080/
    Mem analysis:
    Using pmap: with pmap $PID | tail -1 >> mem.txt; SLEEP 0.001
    Using system monitor: non scientific analysis by looking at the processes :P

    vert.x (1 instance)
    ———————
    Mem (pmap): 2. 4G
    Mem (System Monitor): 120M
    Requests per second: ~ 30K/s

    (Added JVM flags -Xmx128m)
    Mem (pmap): 446M
    Mem (System Monitor): 75M
    Requests per second: ~ 30K/s

    vert.x (4 instances)
    ————————
    Mem (pmap): 2. 4G
    Mem (System Monitor): 120M
    Requests per second: ~ 60K/s

    (Added JVM flags -Xmx128m)
    Mem (pmap): 513M
    Mem (System Monitor): 75M
    Requests per second: ~ 60K/s

    vert.x (2 instances)
    ————————
    Requests per second: ~ 50K/s

    Node Server (1 process)
    ———————————
    Mem (pmap): ~ 640M
    Mem (System Monitor): ~ 65M
    Requests per second: ~ 2.6K/s

    Node Cluster (4 processes)
    ———————————–
    Mem (pmap): ~ 640M per process ( +1 for the main process) => 3.2G
    Mem (System Monitor): 4.5M + 4 * 65M => 264.5M
    Requests per second: ~ 8K/s

  23. dl says:

    I modified the static file (no clustering) node version a little just to be sure the OS wasn’t doing anything to mess with the results. The gist is here: https://gist.github.com/2653772

    I did not test the node clustering features as these are bleeding edge and I recall Isaac saying many times that it still needs to be worked on.

    The results I get when serving a static file seem way off compared to yours (in terms of difference in speed). Keep in mind this machine is a lot slower (dual opteron). Node streams didn’t do as well as the simple file server, so i didn’t include it here.

    Vert.x (1.0.final)
    Instances: 1
    Results: 63893 Rate: count/sec: 7284.76821192053 Average rate: 6510.885386505564

    Node.js (0.6.17)
    Instances: 1
    Results: 20301 Rate: count/sec: 6000.0 Average rate: 6206.590808334565

  24. Tim Fox says:

    Why are you running Vert.x with -instances 1? – that’s deliberately crippling it.

    • dl says:

      I did the same with node. Like I said, node clustering isn’t considered a terribly useful feature yet.

  25. broofa says:

    @Tim: First, nicely played … :)
    Opening Disclaimer: So don’t read too much into these results.
    Closing Summary: The numbers speak for themselves.

    Second, I’m very interested in this benchmark and would love to see it tweaked to address some of the concerns that have been raised. To me, the most interesting part of this is how the different languages vert.x supports compare to one another, and to node.

    Unfortunately the controversy around filesystem caching and clustering is getting in the way of this. How do you meaningfully compare a multi-core, cached static file server to a single-core, uncached one? You don’t – the systems in question are too different to make the results meaningful. Or if you do, it’s only via a bit of extrapolation and guesswork, which definitely diminishes the value of the results.

    Thus, Tim, to answer your question above, the reason for deliberately crippling Vert.x to a single core is to make the comparison with Node more meaningful. We all know that multi-core support is not an intrinsic part of node and that it’s an issue that has to be dealt with. But there are solutions, so let’s not worry about that for the moment. Instead, I’d prefer we establish a benchmark that tests functionality that is as similar as possible so we can focus on the other areas of comparison.

    • Tim Fox says:

      Hi broofa,
      Neither Vert.x nor the JVM caches the files.

      About crippling: It’s more meaningful to compare a _non-crippled_ Vert.x server. After all people aren’t going to deploy crippled servers in production.

      I can see it now: “We’re using Vert.x for our web site, but we’ve crippled it to use one core, just so we can be fair to Node.js” << Not going to happen.

      To be honest, I should never have published the "crippled' Vert.x results – the reality is you take a piece of software and you run it on your hardware. If one of the competitors can't scale over that hardware you don't break the other one just to be seen to "fair". It's nonsensical.

      • broofa says:

        [Before I go further, thanks for posting this benchmark. I'm getting a lot out of this conversation so dont' think it's going unappreciated. :) ]

        Tim, the caching and clustering issues are both well-known in the node community, with a multiplicity of solutions in both areas. It’s certainly fair to talk about code and configuration complexity needed to address these problems in Node, but saying they’re an important basis for a performance comparison is nonsense. Anyone interested in Node perf will certainly be willing to talk the half-day needed to configure the necessary static file and clustering support available for node.

        What’s most interesting here is how node and vert.x perform with these known issues taken out of the equation (or appropriately factored into it.) But insisting on comparing multi-core vert.x setup with a single-core node setup is just arguing for apples-to-oranges comparison, which has little value to anyone.

  26. Tim Fox says:

    >>>> But insisting on comparing multi-core vert.x setup with a single-core node setup is just arguing for apples-to-oranges comparison, which has little value to anyone.

    The number of cores it uses is not really relevant for the end user. They have some hardware and they want to see how much work it can do. That’s really what it boils down to.

    What doesn’t have value is artificially crippling one system just to be “nice” to the other system. That has little value since the user is never going to do that :)

    • jbilcke says:

      Maybe someone could also port the benchmarks to RingoJS? This project is also playing in the field of “Server-side JavaScript” on the JVM, but the benchmarks I found seems to be quite old: http://hns.github.com/2010/09/21/benchmark.html
      Also, they don’t measure the same things.. so the eternal question of “what is relevant?” is still here.

    • dl says:

      Interesting. I hope I didn’t come across as trying to mislead anyone. I thought the fact that I limited vert.x to once instance was obvious. My intent was to find out where the real difference in speed is and, as expected, it’s in node’s multicore support. It doesn’t appear that there’s anything special (outside of this) that’s causing the speedup. This is a well known issue within the community. It’s good to point this out for anyone who may be attempting to make comparisons when deciding on a platform as some people may not realize it.

      At first, when I read the blog post, I was intrigued that vert.x may have some special sauce beyond it’s clustering capabilities that would make it the clear winner in all use cases. This is why I limited vert.x to one instance to make sure that there wasn’t more to it. I simply felt this was the best apples to apples comparison.

      To be clear, as I stated, I compared node’s cluster performance against a multi instance vert.x server and those results matched yours.

      So yes, vert.x appears to have much better multi-core support. That said, it shouldn’t be overly shocking. This is been an issue in node for quite some time now and it’s one of the trade-offs one must make when running it. Hence the reason a lot of node setups involve some sort of nginx reverse proxy with multiple redis backed node app servers.

      • Tim Fox says:

        I didn’t get the impression you were misleading anyone :)

        There is a little bit of “special sauce” in vert.x, e.g. we do zero copy from file to socket if you use the sendFile function. I don’t think node does anything like this. This can be really efficient for large files.

        Also, from what comparisons I’ve seen Java on JVM is generally somewhat faster than JS on V8 – there are some exceptions, e.g. the V8 regex impl is really fast, so ymmv.

        But yes, fundamentally Vert.x utilises the CPU resources better than Node.js. BTW this is not clustering – it’s all happening within a single process.

    • broofa says:

      >>> What doesn’t have value is artificially crippling one system just to be “nice” to the other system. That has little value since the user is never going to do that.

      I agree that testing across multiple cores is useful. Clearly how well a server scales as the # of cores increases is an important trait. But that’s best characterized by testing with different numbers of cores. Testing with 6 cores, as you’ve done above is just one data point. It shouldn’t have any more or less significance than testing with 8 cores, or 32 cores… or 1 core.

      Note, too, that it’s not unreasonable to expect readers to extrapolate from a single core to however many cores they happen to have. Is it a direct multiplier? No, of ourse not. But in relative terms it’s a decent indicator of multicore performance. Your own data backs that up.

  27. Tereska says:

    On 1-core instances vertx.js is slower than node serving small responses (100bytes) and very little faster when serving larger responses (35000 bytes).

    • Tim Fox says:

      You’re just repeating what someone else has already said, and I’ve already replied to that :)

      • Tereska says:

        Just sharing my results….. nothing more ;P

      • Tim Fox says:

        The node.js obsession with running on a single core is bizarre. As we all know *all* computers these days have just one core, right?

      • dl says:

        It’s not really an obsession. It’s just the only way anyone should run node right now. The cluster feature in node is highly experimental. It’s a trade-off. So, you typically have to launch multiple processes if you need to support additional cores. Everyone agrees that this sucks and the cluster module is an attempt to fix this one day.

      • broofa says:

        >>> “The node.js obsession with running on a single core is bizarre. As we all know *all* computers these days have just one core, right?”

        It’s not an obsession. It’s pragmatism. Cores are the atomic unit of CPU resources; It’ s not unreasonable to extrapolate from single-core perf to multicore perf; Node happens to be easy to test on a single core; and, most important, testing a single core is just easier since it requires fewer resources, as well know:

        >>> “All the benchmarking is done on a single machine (my desktop). This is not ideal…”

        *ahem*

        >>> “The number of cores it uses is not really relevant for the end user. They have some hardware and they want to see how much work it can do. That’s really what it boils down to.”

        … so how does one extrapolate from your 6-cpu test to a systems with 4, 8 or 32 cores? Answer: the same way they’d extrapolate from a single core test to your 6-core test.

    • Tim Fox says:

      BTW… for larger files with vert.x it’s recommended to use the sendFile command, not readFile

      • Tereska says:

        In my example i’ve cached files in memory before server starts so this is irrevelant but thanks for point this out.

        About 1 core and node this is because this is the only way you can compare vertx with node right now. Of course there is cluster but it’s still in experimental phase. Of course you can setup LVS for node 1 core machines and than compare it with vertx.

        Is there any npm (packet manager) or something like this for vertx?

        Dont get me wrong I like vertx.js :) I like node.js also :)

      • Tim Fox says:

        Well… sendFile tells the kernel to copy the file direct from disk to socket without going through userspace so it should be more efficient (with larger files) even if you cache the file in memory since it avoids you writing the file manually to the socket.

        I don’t get the whole “cores must be the same to compare” argument. That makes no sense to me.

        It’s like having a race between a motorcycle and a car, and forcing the car to remove two of its wheels so they have the same number of wheels.

        So what if one utilises cores better? The purpose of the race is to see which one crosses the finish line first.

        Requiring same number of cores is a completely arbitrary requirement, you might as well require there are the same number of lines of code in the projects.

      • If you want to compare vert.x with multiple cores, how about use node.js in a way that people actually do to use multiple cores?

        How about use a lb or reverse proxy and run multiple node instances? If your interested in comparing apples to apples anyway. If, on the other hand, you simply want to compare what a machine can do when you use one core vs multiple cores.. then really it’s not a surprising result.

        The reason for comparison using a single core on both is that if one is faster on a single core, then the internal efficiency is removed from the comparison and the speed advantage from vert.x comes from using multiple cores. The idea is to remove all variables to find the source of the speed advantage. Typically when doing benchmarks that’s the goal.. find the real source of the difference.

        If you are simply not interested in why vert.x is faster than so be it in some scenarios then so be it.

  28. drcypher says:

    So, what’s the context of the comparison? Similarity or user happiness? What broofa says is you’re comparing apples to oranges and your reply is… “yeah, but people like fruit”?

    Numbers don’t “speak for themselves”. They mean something as long as they correspond to something that’s meaningful. Node.js “reads text files” faster than.. me, does that make it cooler now? Would you make a graph out of it? Or is it a “silly comparison between two different things”?

  29. Tim Fox says:

    The context is simply comparing one system against the other on some hardware, and seeing which one can do more work.

    • Tereska says:

      I think You should do test like this:
      vert.x js on single machine 8 cores
      vs
      8x nodejs processes (without cluster) binded to each core and balanced by LVS

      If vertx will beat this setup this will be the news for everyone!

      • +1 This.

        Would also be good if the two benchmarks didn’t exploit a difference in how node and vertx handle file reading. i.e. have both cache the files.. A true apples to apples comparison then.

        Those two things, and I suspect the results would be wildly different.

      • Tim Fox says:

        Vert.x does not cache any files. The caching will be going on at the OS layer. I suspect this will explain the better file system performance.

        I’ve no idea why node.js doesn’t benefit from the caching – most probably it is using a different OS api to the JVM.

        Just manually caching files in your node.js code to get better performance is not really the same thing as relying on the OS, e.g. if the file is changed on the disk the OS cache will be automatically invalidated – this won’t be the case if you do simple caching yourself as proposed.

  30. Assem Hakmeh says:

    Tim,

    Thank you for the benchmark. Given the context of simply using a platform with default (recommended) settings and doing a simple use case comparison, I find this quite meaningful.

    I am certain that given enough time, someone interested in using node can work pretty hard and boost the performance up to a reasonable level. Using an lb and many cores/servers, I’m sure node can perform well and that’s beside the point.

    However, it’s seems obvious that vert.x is leaner, more efficient, and makes better use of the hardware out of the box, thanks to the JVM and mature libraries like netty. This translates to improved productivity and value for the developer and their client. In my book that’s what matters most.

  31. koichik0818 says:

    Hi Tim,

    Why do you use HTTP pipelining? It is not used widely on the Web. Only Opera enables it by default as far as I know. I think that your benchmark is not realistic.

    Instead, you should use more and more connections. Please remember that single-threaded event-loop architecture was re-discovered to solve C10k problem.

    • Tim Fox says:

      Pipelining (both http and websockets) is going to become more and more important going ahead for the next generation of rich client side “real-time” JS apps.

      Vert.x is aimed as the framework of choice for these kinds of apps. That’s why I’m measuring perf with pipelining.

  32. kohlerm says:

    Heres my summary:
    1.The JVM/Vertx has better multicore support. Yes there are solutions for node.js to use more than one core, but they seem to be less efficient than the Vert.x solution. Running more than one node.js process will add additional memory overhead. Running more processes on a machine versus running more threads is also less efficient
    2. The Performance characteristics of certain APIs in node.js might be less predictable than on the JVM. The reason is probably that the JVM is just older and therefore the libraries are in average probably more mature.

    Note that I would guess that in general the memory overhead of node.js for applications with large number of objects is larger than a JVM based solution. This is due to the fact that javascript is a prototyped language which has more complicated object layout compared to a VM that is optimized for a statically typed language.

  33. Pingback: Vert.x: Polyglottes Webframework in Version 1.0 erschienen | www.Anime-Island.org

  34. Pingback: Ozcrates.CO.CC | Oliver Zdravkovic » Vert.x: Polyglottes Webframework in Version 1.0 erschienen » Sport, Politik, Technik, Psychologie

  35. I’ve started investigating several aspects of this benchmark. Here’s one, regarding the crippled test.

    I’m not sure that crippling vert.x to use a single core by using “-instances 1″ is valid. Here’s what happened when I tried:

    # time ../../bin/vertx run org.vertx.java.examples.httpperf.PerfServer -cp classes -instances 1 -cluster-host 127.0.0.1
    ^C
    real 0m31.725s
    user 0m33.870s
    sys 0m21.175s

    This shows that during 31 seconds of elapsed time, vertx consumed 55 seconds of CPU time (33.8 + 21.2), which is only possible if it is multi-threaded and running concurrently. At this rate, it is using about 1.7 CPU cores.

    Double checking using DTrace:

    # pargs 67542
    67542: java -Djava.util.logging.config.file=../../bin/../conf/logging.properties -Djru
    argv[0]: java
    [...]
    argv[9]: org.vertx.java.examples.httpperf.PerfServer
    argv[10]: -cp
    argv[11]: classes
    argv[12]: -instances
    argv[13]: 1
    argv[14]: -cluster-host
    argv[15]: 127.0.0.1
    # dtrace -n ‘profile-1000 /pid == 67542/ { @ = count(); } tick-1s { printa(“%@d samples”, @); trunc(@); }’
    dtrace: description ‘profile-1000 ‘ matched 2 probes
    CPU ID FUNCTION:NAME
    5 64930 :tick-1s 1688 samples
    5 64930 :tick-1s 1701 samples
    5 64930 :tick-1s 1712 samples
    5 64930 :tick-1s 1675 samples
    5 64930 :tick-1s 1705 samples
    5 64930 :tick-1s 1703 samples
    5 64930 :tick-1s 1715 samples
    5 64930 :tick-1s 1686 samples
    5 64930 :tick-1s 1687 samples
    5 64930 :tick-1s 1704 samples
    5 64930 :tick-1s 1690 samples

    This shows vertx was on-CPU for about 1700 samples per-second, sampled at 1000 Hertz. This is actually a more coarse measurement (the previous time command hooks into thread microstate accounting), but enough to double check. It’s the same result as the time test.

    With vertx running on 1.7 CPUs, its performance (in my setup) was (via RateCounter):

    22528 Rate: count/sec: 47682.11920529801 Average rate: 45809.65909090909
    25548 Rate: count/sec: 46357.615894039736 Average rate: 45874.43244089557
    28568 Rate: count/sec: 47019.867549668874 Average rate: 45995.519462335484
    31588 Rate: count/sec: 47682.11920529801 Average rate: 46156.76839306066
    34608 Rate: count/sec: 47019.867549668874 Average rate: 46232.085067036525
    37628 Rate: count/sec: 47682.11920529801 Average rate: 46348.46390985436

    To do the crippled single core test, you can use pbind (on OSes that have it) to bind vertx to a single CPU core, and the same for node.js.

    # pbind -b 10 96587
    process id 96587: was not bound, now 10

    The vertx performance becomes:

    113129 Rate: count/sec: 25165.562913907284 Average rate: 32228.694675989358
    116149 Rate: count/sec: 26490.066225165563 Average rate: 32079.48411092648
    119170 Rate: count/sec: 16550.810989738497 Average rate: 31685.82696987497
    122190 Rate: count/sec: 15231.788079470198 Average rate: 31279.155413699977
    125210 Rate: count/sec: 12582.781456953642 Average rate: 30828.20860953598
    128230 Rate: count/sec: 14569.53642384106 Average rate: 30445.29361303907
    131250 Rate: count/sec: 11258.278145695363 Average rate: 30003.809523809523
    134270 Rate: count/sec: 20529.80132450331 Average rate: 29790.72019066061
    137290 Rate: count/sec: 27152.3178807947 Average rate: 29732.682642581396
    140310 Rate: count/sec: 26490.066225165563 Average rate: 29662.889316513436
    143330 Rate: count/sec: 26490.066225165563 Average rate: 29596.03711714226
    146350 Rate: count/sec: 26490.066225165563 Average rate: 29531.943969935088
    149370 Rate: count/sec: 15894.039735099337 Average rate: 29256.209412867378
    152390 Rate: count/sec: 14569.53642384106 Average rate: 28965.155193910363
    155410 Rate: count/sec: 11920.529801324503 Average rate: 28633.936040151857
    158430 Rate: count/sec: 15231.788079470198 Average rate: 28378.463674809063

    Dropped from 47k to between 11k and 27k, and variable.

    Double checking that pbind worked as intended:

    # dtrace -n ‘profile-1000 /pid == 96587/ { @ = count(); } tick-1s { printa(“%@d samples”, @); trunc(@); }’
    dtrace: description ‘profile-1000 ‘ matched 2 probes
    CPU ID FUNCTION:NAME
    11 64930 :tick-1s 961 samples
    11 64930 :tick-1s 970 samples
    11 64930 :tick-1s 969 samples
    11 64930 :tick-1s 988 samples
    11 64930 :tick-1s 951 samples
    [...]

    I’m testing on a 24-way idle server running SmartOS. I’m using loopback and a single system, as I want to take lower-level TCP/IP stack off the table and focus on differences between the applications. I’m assuming both vert.x and node.js use the lower-level TCP/IP stack in the same way, allowing it to be eliminated (I can check that using tcpdump/snoop/dtrace if needed). The system has enough CPUs to run the client, and, is using the same client for both.

    Anyway, that’s just the first aspect I was looking at. More to come (if I have the time).

    • Tim Fox says:

      Brendan,

      I should have been more clear, -instances 1 does not strictly limit the number of cores used by the JVM process to 1. In fact there is no way in Java to tell the kernel what cores to put any threads on.

      What it does is tells Vert.x to use only one *event loop*. Vert.x internally uses other threads for other stuff, e.g. blocking IO calls. When these actions complete the results of these are put back on the event loop.

      AIUI this is no different to how node.js works http://www.quora.com/How-does-IO-concurrency-work-in-node-js-despite-the-whole-app-running-in-a-single-thread

      Also the JVM itself maintains other threads, e.g. for garbage collection.

      The net result is there are in fact several threads (you can see these for yourself using any Java monitoring tool), it’s up to the operating system to distribute them as it likes over available cores. Vert.x (or any Java program) cannot influence that.

      Too many people on this thread are focusing on the results I published for a “single core” (I should have more correctly said a single *event loop*), as I’ve said in other replies I think these results are largely irrelevant and I should never have published them, since they don’t reflect what anyone would do in real life .

      So I’ve updated the original post and removed the single event loop results so people can concentrate on the uncrippled results.

  36. There’s some confusion about the “cores must be the same to compare” comments. I don’t think anyone has mentioned price/performance yet.

    In environments I’m familiar with, customers are paying for instances by the core (or by a specific fraction of a CPU). So, either having the cores the same, or dividing by core count, will normalize the result. Another way to do this is to divide the result with the cost of the system.

  37. David says:

    Not sure where this discussion has drifted, but I was planning to do my own benchmarking of node and vert. I use both node and groovy in production, and have 15 years of experience building and deploying real-time web-based applications. I’m currently evaluating whether to continue polyglot, or choose one platform, hence my interest in this topic. My comments are as follows:

    Firstly, I would not test anything Serially – it gives the synchronous jvm platform a huge advantage, and, more importantly, doesn’t represent any real world use case (unless you anticipate a small number of users issuing thousands of requests per second!), So: I’d suggest that you find a way to simulate Concurrent access, of 1,000-10,000 simultaneous clients. This is likely to be easier to generate using node as the test harness.

    Secondly, I would avoid the file system altogether, because: a) any real-world solution would use some sort of caching anyway, and, b) as you’ve learned, by including the disk, you’re not actually testing the webserver itself, but rather the File I/O subsystem of each framework! I am somewhat surprised that you chose to include disk access in a comparison of web servers, it doesn’t exactly inspire confidence in the vert.x team…. I’d suggest a rest-based request for a timestamp, using a unique url for each request (ie: add the local time to the url). The last thing I’d ever test would be a repeated request for the same url, which you appear to have done. There are simply too many places for caching to occur in the pipeline (including in the browser!).

    Thirdly: We need to see memory usage. Traditional JVM-based web servers run out of memory when faced with many concurrent connections from simultaneous clients. Several readers have already made this request, but it bears repeating: Show us the memory!
    Unless you’ve demonstrated clearly that your JVM solution will not blow up under concurrent load, I don’t think you’re going to convert any node users to your platform.

    The real question, as yet unanswered, is this:
    Can the speed advantage of the JVM vs V8 (4-5x based on the current language shootout benchmarks), make up for the async processing framework that is node, when faced with a load of several thousand of simultaneous/concurrent clients?

    My guess is that if you can get the JVM to handle requests asynchronously then it will easily outperform node, especially on multiple cores.

    • Tim Fox says:

      Hi David,

      The benchmark does not test ‘serially’ (I assume by ‘serially’ you mean using one connection?). It uses 60 concurrent connections.

      Also all requests are handled asynchronously. Vert.x is an async platform like node.

      I totally agree that the benchmark can improved in many ways. Like it says in the disclaimer at the top of the article, it was meant as a bit of fun, not as a serious benchmark. Read into it what you will :)

      • David says:

        @Tim:

        1) Regarding the issue of concurrent connections:

        Perhaps I should have said ‘relatively serially’. Achieving thousands of requests per second with 60 connections does not, especially on the JVM, allow you to extrapolate that you can achieve the same number of requests per second with thousands of unique concurrent connections.

        Your benchmark has demonstrated that you can get good throughput with 60 concurrent clients on the JVM, period. I agree that that it is quite impressive on the JVM, but I don’t agree that you can extrapolate that achievement to imply that you can get the same aggregate throughput with > 60 clients. It doesn’t even mean that you can get ANY throughput at all with > 60 concurrent clients.

        I’m willing to bet that you could achieve the same or better throughput (and with even lower latency) on Tomcat or Jetty, with 60 threads, enough memory, using a Restlet long polling persistent servlet-per-thread connection technique Circa 2003! ie: If your business objective is to get maximum throughput with at most 60 clients on the JVM, then just give each client it’s own thread and give the JVM enough memory.

        However, try 1000 concurrent long polling connections on Tomcat, and you’ll achieve zero throughput because the JVM will have crashed before you got past, at absolute best, a few hundred, concurrent long polling connections (and after consuming of few GB of RAM). Been there, done that (in production, oops).

        Node on the other hand, handles 1000 concurrent connections easily, with one thread, a tiny amount of RAM, and negligible CPU usage – you could do this on a free micro AWS EC2 machine.

        Given that the price of cloud computing is based largely on RAM, you start to see why we’re asking to see the memory: it’s a question of CPD (connections per dollar).

        2) Regarding the issue of async processing:

        I’m struggling with your comment:

        “all requests are handled asynchronously. Vert.x is an async platform like node”.

        This might be true at the request level, however, node goes further than that.
        In node, every function in the framework is asynchronous and non-blocking implicitly, that’s just the way it is (and it’s huge pain to work with actually).
        On the JVM while the Netty IO is non-blocking, all of the other methods which process the request won’t be non-blocking and asynchronous unless you explicitly implement them as such.

        I continue to believe that a JVM solution should outperform node, but I don’t think your benchmark demonstrates that you’ve achieved this yet.

        Keep up the good work, you’ll get there in the end.

      • Tim Fox says:

        David,

        >>> Perhaps I should have said ‘relatively serially’. Achieving thousands of requests per second with 60 connections does not, especially on the JVM, allow you to extrapolate that you can achieve the same number of requests per second with thousands of unique concurrent connections.

        Well, I didn’t claim it did ;) The purpose of this benchmark was not to test performance with a large number of connections.

        BTW… Vert.x is designed to cope with very large number of connections. I have tested up to 40k connections successfully on the same desktop used for this benchmark, and we hope to show a server with 1M connections in a different benchmark before too long.

        The main problem with testing many connections is running out of ephemeral ports on the client side, so you need lots of clients :) But anyway, that’s for another benchmark.

        >>> This might be true at the request level, however, node goes further than that.
        In node, every function in the framework is asynchronous and non-blocking implicitly, that’s just the way it is (and it’s huge pain to work with actually).

        The vert.x api is completely non blocking as well. No different to node here. [Actually both Vert.x and node.js have some synchronous file system functions, but these aren't used in the tests].

    • David you raise very good points. I think we all need to see past the numbers. The primary purpose of any benchmark or test should be to allow observations under particular working conditions which aid learning and understanding in a way that can be applied elsewhere.

      There are two points of view here we need to always keep in mind: macro and micro.

      [micro] This benchmark is aims to determine the minimum cost and maximum throughput one could achieve in doing pretty much nothing in regards to the realities of real world software execution and resource consumption. But in mapping this to a more realistic workload we need to determine how much of the framework execution cost determines the performance (or cost) of the system, application or node (service). This could be as little as 10% so we could in fact be fighting over single digit percentage differences. [macro] That said in a more complex network with lots of inter-dependencies (interconnectedness) such % can very easily add up (think supply chain dynamics & management).

      If Tim’s fine implementation work at the application-meets-the-framework level can match node.js feature set and ease of development and knowing the JVM is years ahead in dynamic runtime compilation & optimization (with a language design that is much better aligned to latency cost optimization) one could very easily see the benefits that this particular solution offers over an implementation of the past.

      What such benchmarks offers is a glimpse into the level of car and quality of design and engineering that has gone into a solution assuming the benchmark has not been tailored to play to its strengths with a complete disregard for weaknesses that are far worse than one would envisage.

      • David says:

        @JXInsight: I just took a look at your website, hadn’t heard of JInspired before but it seems pretty impressive, certainly you are the right person to weigh in on this topic. Have you tried to profile vert.x yet?

      • @JXInsight: you say “with a language design that is much better aligned to latency cost optimization”; could you provide a specific example of such “language design”?

      • @JXInsight: you also say “What such benchmarks offers is a glimpse into the level of car[e] and quality of design and engineering that has gone into a solution …”. I would not say that this is a reliable correlation.

        Such benchmarks test a particular code path, with a particular workload. The next exercise is to see how relevant that code path, and that workload, is for your intended use.

        If you want care and quality of design and engineering, you may be better off looking at the software’s test suite, than a particular micro benchmark.

  38. I was interested in the differences in file system access between node.js and vert.x. Its obvious from Courtney’s result there is something going on here. Some have suggested that the JVM or vert.x is caching, but looking at the code I can’t find any evidence of this.

    To investigate I ran vert.x with -instance 1 and node.js through strace. The results are on github. Excerpts for the initial runs can be seen here:

    vert.x: http://bit.ly/JlK7i4
    node.js: http://bit.ly/JlK7i4

    From these we can see that both vert.x and node.js are both performing read(2) access for every request.

    What’s interesting, though, is the pattern of access. First, vert.x’s pattern looks something like:

    stat(“foo.html”)
    open(“foo.html”) = 42
    read(42, …)
    close(42)
    stat(“foo.html”)
    open(“foo.html”) = 42
    read(42, …)
    close(42)
    stat(“foo.html”)
    open(“foo.html”) = 42
    read(42, …)
    close(42)

    In fact, vert.x only ever has one file descriptor open at a time. This suggests to me that the current implementation of readFile in vert.x might be considered bugged. Consider what happens if you are reading files of different sizes, etc. According to these results your small files would be blocked waiting for larger files to be read.

    In contrast, node.js is interleaving file operations:

    open(“foo.html”) = 16
    open(“foo.html”) = 17
    open(“foo.html”) = 18
    read(16, …)
    read(17, …)
    read(18, …)
    close(16, …)
    close(17, …)
    close(18, …)

    In essence, node.js is treating each stage of the file access as an asynchronous event while vert.x is treating the entire file read as a single synchronous operation.

    To try to measure the effect this access pattern has I wrote a new vert.x PerfServer that uses AsyncFile and chunked encoding:

    http://bit.ly/JJ4vYr

    The response rate results looked like this:

    vert.x original: ~10,000 resp / sec
    vert.x async file: ~6,600 resp / sec
    node.js: ~4,000 resp /sec

    For reference, this was measured on a two core ubuntu VM. I used -instance 1 on both server and client to attempt to avoid CPU competition between them.

    Note, in this setup node.js is CPU bound using 99% of one of the cores. The vert.x server is using about 80% of a core and the client is using about 40% of a core.

    So this explains some of the variation, but not all.

  39. The next aspect I examined was the limiter for the original node.js fs.readFile benchmark. It’s a bit long, so I’ve put it here:

    The tl;dr is that about 2x (or more) is likely to do FS object allocation and GC. If this type of usage is real world, the fs.js package could be improved.

  40. koichik0818 says:

    Vert.x uses java.nio.channels.AsynchronousFileChannel, and it is described as:

    http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousFileChannel.html

    >> As with FileChannel, the view of a file provided by an instance of this class is guaranteed to be consistent with other views of the same file provided by other instances in the same program.

    It seems to suggest that some kind of cache exists between AsynchronousFileChannel instances.

    • As Ben showed, vert.x is calling open(), read() and close() on the file. This is also performed at the same rate as the result:

      # dtrace -n ‘syscall::open:entry /pid == $1/ { @[copyinstr(arg0)] = count(); } tick-1s { printa(@); trunc(@); }’ 95933
      dtrace: description ‘syscall::open:entry ‘ matched 2 probes
      CPU ID FUNCTION:NAME
      3 64930 :tick-1s
      httpperf/foo.html 52528

      3 64930 :tick-1s
      httpperf/foo.html 55878

      3 64930 :tick-1s
      httpperf/foo.html 55738

      from around the same time:

      725784 Rate: count/sec: 56291.39072847682 Average rate: 60574.49599329828
      728804 Rate: count/sec: 54966.887417218546 Average rate: 60551.259323494385
      731824 Rate: count/sec: 56291.39072847682 Average rate: 60533.68022912613
      734845 Rate: count/sec: 54948.69248593181 Average rate: 60510.71994774408

      These are about the same. If vert.x is caching, it’s bypassing its own cache.

      • I should add that caching is happening, on my system, by the OS file system. Hence the microsecond read times in the latency distribution plots I posted. I haven’t seen evidence that the application is caching as well.

  41. In my previous test I had a hard time getting the vertx server to saturate CPU. Based on Brendan’s findings about -instance not strictly being single core it seems likely that it was using the second core in the VM. Given that node.js was saturating on CPU I wanted to get vertx into a similar situation to compare numbers under the same restricted resources.

    Since I lack additional hardware, I tried running some tests on a couple joyent cloud VMs. I used:

    vertx/node server: small, 1GB, 1 CPU
    vertx client: small, 1GB, 2 CPU

    Both were running debian 6. I used the 2 CPU client so that I could run the vertx client with -instance 2 to fully saturate the server. Node was version 0.6.17.

    I tested the chunked node implementation and the vert.x using my AsyncFile version of the PerfServer.

    Average results:

    Node: 2040 responses / sec
    Vert.x AsyncFile: 1760 responses / sec

    I ran a regression on the data and it showed the result was statistically significant with 99% confidence. The R-squared was 65% meaning about 35% of the overall variation in the results was not explained. (This could be garbage collection, VM performance variations, etc.)

    Both Node and Vert.x were CPU bound on the single core VM. Node showed a CPU breakdown of 50% user and 50% system. Vert.x showed CPU usage of 20% usage and 80% system. I assume the overall increase in system time was due to overhead in the network stack.

    Anyway, just another data point.

  42. Tim,

    There has been a lot of back and forward about this test, a lot more sound than light. Is there any chance you could take the time to do a more in-depth benchmarking? I think it would be really useful for everyone.

    I am one of the people who has not opted for any of the solutions you have tested, but is looking at all of them. I am drawn by vertx, but also a bit hesitant due to the lack of clarity about the whole genre of these solutions.

    Part of the problem is that they all seem to have a polarised following and (for some reason) evoke a lot of emotion in people.

    I think it would greatly benefit vert.x future to run a standardised set of tests on it. You don’t even need to run the same tests against node.js. What is more important is a real testing and understanding of vert.x under pressure in a simulated heavy environment. Use of some standardised tests would mean that it is easier for us to understand just where vertx is and what it’s strengths are.

    I know this is asking work from someone else (I am just not competent enough with it) but hopefully someone will do it. I think it really does need done, regardless of what node etc are doing.

  43. Pingback: Getting started with Vert.x and Java on OS X | Giant Flying Saucer

  44. Pingback: Cheatsheet: 2012 05.04 ~ 05.15 - gOODiDEA.NET

  45. Pingback: Vert.x vs node.js vs APE2 simple HTTP benchmarks » Anthony Catel

  46. Pingback: The speed race is on: Vert.x and Node.js | Giant Flying Saucer

  47. Pingback: Truth in Benchmarking! | Webtide Blogs

  48. Pingback: Node.js is a toy and CoffeeScript is the Devil | Giant Flying Saucer

  49. Pingback: Peppermint #5: Benchmarking concurrent websockets « Summer Peppermint

  50. Pingback: Vert.x, nuevo competidor para Node.JS | Viricmind Labs

  51. Pingback: Node.js vs SilkJS « T F D

  52. Pingback: Clearing up some things about LinkedIn mobile’s move from Rails to node.js « Ikai Lan says

  53. Pingback: WebBrain » nodeJS 영감을 받은 vert.x가 성장중…

  54. Pingback: Vert.x - La boîte à outils | Blog Xebia France

  55. Pingback: Asynchronous Event-driven Network Application Framework:vert.x, Node.js or Netty | RubyPDF Blog

  56. Pingback: Distributed Computing: Links, News And Resources (4) « Angel ”Java” Lopez on Blog

  57. Pingback: Scalability: Links, News And Resources (3) | Angel "Java" Lopez on Blog

  58. Pingback: Java에서도 Node.js를 사용해보자 – Vert.x(0) | iamapark89

  59. Pingback: 【node.js】サーバサイドjavascript 2【Rhino】 0-50 | サーバの話題

  60. Pingback: Vert.x programación asíncrona en Java | myDevBlog

  61. Pingback: JDD 2013 | Technical blog

Comments are closed.