Node.js, Websockets, and the Twitter Gardenhose

Introduction

In this article I’m going to demonstrate how to use the Twitter Streaming API to send a stream of status updates for real time display in a web browser using Web Sockets. We’ll implement a backend server module that streams the events over a web socket using the node.websocket.js framework, which provides a simple realtime HTTP server that implements the web socket protocol. Node.websocket.js is built on top of Node.js which provides an Asynchronous API using callbacks. Node.js is a server side javascript library that enforces an event driven style of programming which allows you to develop non-blocking code easily. This in turn allows you to write simple servers that are very CPU and Memory efficient because you don’t have multiple threads taking up shared resources. Node.js is implemented using Google’s V8 javascript engine and also the CommonJS specification which is defining a standard library for server side javascript.

Getting Started

First things first, you need node.js, which you can get from github. Install in the usual way

$ git clone git://github.com/ry/node.git
$ cd node
$ ./configure
$ make
$ sudo make install

Then we’ll need node.websocket.js which you can also get from github

$ git clone git://github.com/Guille/node.websocket.js.git

This is basically an experimental implementation of the web socket API. You can create simple server side modules and then a client side implementation that opens up a web socket to the server in which you can then exchange data in both directions. There are a few examples included that you can take a look at including a simple echo server and a chat server. Just fire up the server like so

$ node runserver.js

and then open up one of the html files in the test/websocket/ directory in your browser. One catch though, you’ll need a current browser such as Google Chrome. I’d recommend it anyway, as browsing the web with it does run a bit smoother.

You’ll also need curl, which is pretty common on any linux box these days and can be installed through your package manager.

Setting Up the Server

The way we’re going to implement the server is to use the curl command to pull from the twitter stream into a file. Twitter gives you a bunch of JSON objects back which you can then parse and display. We’ll use this file in a moment to send the data over the web socket to the browser. There is some documentation on what you can do with the API but we’ll keep it simple and search for any tweets with ‘nyc’ in them

$ curl -dtrack=nyc http://stream.twitter.com/1/statuses/filter.json  -uUSERNAME:PASSWORD > sample.json

Now its time to write some server side javascript. In the node.websocket.js checkout there is a modules/ directory. We’ll create a new module called gardenhose.js. So the full path is node.websocket.js/modules/gardenhose.js

In a nutshell this is waiting for the client to establish a connection, and then it creates a child process that tails our file from the curl command above. Anytime the file is written to the “output” listener is invoked, which runs our callback to parse the JSON into objects that we can then use to send a string back to the client with some readable information from the stream.

Lets break it down just a bit more in case you are not familiar with Node. First we are requiring the system and filesystem modules from Node. Now, node.websocket.js basically just uses the node API to implement a server in the websocket.js file. It looks at the request header to see which module to instantiate and then invokes your onData() method when a client sends over data.

Therefore the onData method is the one we need to implement in our module above. We’ll use the process object in Node to create a child process that emits an event called “output” each time the child sends data to stdout. So the addListener call sets up a callback that will be invoked when our file receives more data from the twitter stream. That data comes in the form of JSON objects, one per line. So we split on the lines to create an array of JSON objects, and loop through them. Each time through the loop we’re sending this data back to the client, which is the web browser.

Just make sure the file you pass in is the correct path to the file you are outputting to from the curl command in the monitor_file() function. Then to run the node.websocket.js server you can just invoke it like so

$ node runserver.js

However if you are running the server from another host, you may want to listen on more than just the default of localhost

$ node runserver.js --host=0.0.0.0

Setting Up the Client

The goal here is to get the realtime tweets pumping through our web browser so our client will just be a web page with a little bit of web socket javascript.

This is probably a bit more straightforward. We’re just implementing a few of the functions from the Websocket interface. First we’re instantiating the WebSocket class with the hostname and port that we’re running the server on. The gardenhose in the path is to tell our server that we want to run the gardenhose.js module that we wrote above. The onopen() function is invoked when the socket is opened, and we send over the word “start” which if you recall from above understands the client has connected and to start the child process that runs tail on our file. The onmessage() function is invoked anytime the server is sending data over the web socket, which is the information we want to show on the page, so we append it to the HTML of our hose div. If the server closes the socket then onclose() is invoked, and we display that on the page.

Conclusion

We’ve written a server side module using Node.js and the node.websocket.js framework that will send tweets over a web socket connection. We have also looked at the WebSocket API and learned how to implement some of the functions defined by its interface. Of course, you could use JQuery or your favorite javascript libraries to enhance how this looks to the user, but the basics are all here in how the communication of a real time display can work with web sockets.

References

Async I/O – http://en.wikipedia.org/wiki/Asynchronous_I/O

CommonJS – http://www.commonjs.org/ and http://wiki.commonjs.org/wiki/CommonJS

V8 – http://code.google.com/p/v8

Node.JS – http://nodejs.org

Node.Websocket.JS – http://github.com/guille/node.websocket.js

Twitter Streaming API (Gardenhose) – http://apiwiki.twitter.com/Streaming-API-Documentation

Websocket API – http://dev.w3.org/html5/websockets