By: +David Herron; Date: 2011-05-03 13:41
To start at the beginning we turn to a blog post by Alex Russel titled " Comet: Low Latency Data for the Browser" (it appears you'll have to yahoogle the phrase and then look at the google cache copy of the page). He was co-creator of the Dojo toolkit, co-creator of cometD, and co-author of the Bayeux Protocol, all of which circle around the COMET protocol. It focuses on "low latency data transfer to the browser" where a COMET application "can deliver data to the client at any time, not only in response to user input" where the data is "delivered over a single, previously-opened connection". The model is radically different from the traditional model of a browser opening an HTTP connection, doing a GET or PUSH, receiving data from the server, then closing the connection.
Because a "previously-opened connection" is maintained between server and browser it means servers will see large numbers of open connections. Apache, and other thread-per-connection server architectures, do not scale well with large numbers of connections.
Alex's blog posts suggests that a "long lived page" will go "stale" and that if the page were to maintain a connection to the server the server could update the page as new content arrives on the page. From his discussion I am imagining scenarios with reader generated comments or discussion. For example on twitter.com, as new tweets the twitter servers notify you of the new tweets. For example, facebook.com will nowadays notify you of comments while you're doing other things. For example, you can imagine a blog commenting system which dynamically updates the comment thread on the page as readers add comments. The disqus commenting system almost implements that idea.
Alex suggested "New server software is often required to make applications built using Comet scale" because of the maintained open connection to the server for each browser (client). He went on to suggest "event-driven IO on the server side" as the solution, and named off some possible solutions. Apache was supposed to "provide a Comet-ready worker module in the upcoming 2.2 release" and he named "tools like Twisted, POE, Nevow, mod_pubsub, and other higher-level event-driven IO abstractions". Modern OS's all now "support some sort of kernel-level event-driven IO system" and in Java the NIO layer is a good basis for event driven IO, which has in turn led to implementations in Java appservers like Tomcat.
He asks to imagine an AJAX request that starts with an HTTP GET. The data might not be available and rather than just close the connection the server leaves it open and then sends back the HTTP response when the data is available. Then, it leaves the HTTP connection open and as further data arrives the server sends that data as well.
Essentially it allows to "push" or "stream" data to the web client via standard HTTP GETs instead of for example polling for updates at regular intervals.
Responding to the obvious question of whether all these open connections scale, he writes
Done right and used right this can be a very efficient way to send events to web clients; in fact, it can save a lot of unnecessary “polling” requests. Not only can it be more efficient, updates will get to the clients quicker (lower latency) than when polling.
Maintaining an open connection means keeping a socket open, and keeping some data in the server. Threads or processes per connection are not a requirement, but more an unfortunate architecture choice of some servers. What he says is "you do not want to happen is for the server to block for example a thread per request when it waits for data to arrive – that would be Comet done in a fashion that will not scale well."
Node.js as THE answer?
The way I read the above background is that they were explicitly calling for a server architecture which fortuitously Node.js implements. I don't know of Ryan Dahl had COMET in mind, but his creation fits the bill to a T.
The Node.js architecture is an asynchronous event driven programming model where callback functions are dispatched as events propagate through the system. The events can be I/O such as network traffic, or could be driven from other sources, because events are created by any EventEmitter object.
The result is a platform that makes it really easy to implement event driven server applications (even clients).
A slide show
The following are a two year old slide deck going over this territory