Oct 27 2012

Looking Under the Covers of StatsD

Category: Debuggingjgoulah @ 12:24 PM

Intro

StatsD is a network daemon that runs on the Node.js platform and listens for statistics, like counters and timers. Packets are then sent to one or more pluggable backend services. The default service is Graphite. Every 10 seconds the stats sent to StatsD are aggregated and forwarded on to this backend service. It can be useful to see what stats are going through both sides of the connection – from the client to StatsD and then from StatsD to Graphite.

Management Interface

The first thing to know is there is a simple management interface built in that you can interact with. By using either telnet or netcat you can find information directly from the command line. By default this is listening on port 8126, but that is configurable in StatsD.

The simplest thing to do is send the stats command:

% echo "stats" | nc statsd.domain.com 8126          
uptime: 365
messages.last_msg_seen: 0
messages.bad_lines_seen: 0
graphite.last_flush: 5
graphite.last_exception: 365

This tells us a bit about the current state of the server, including the uptime, and the last time a flush was sent to the backend. Our server has only been running for 365 seconds. It also lets us know when the length of time since StatsD received its last message, bad lines sent to it, and the last exception. Things look pretty normal.

You can also get a dump of the current timers:

(echo "timers" | nc statsd.domain.com 8126) > timers

As well as a dump of the current counters:

(echo "counters" | nc statsd.domain.com 8126) > counters

Take a look at the files generated to get an idea of the metrics StatsD is currently holding.

On the Wire

Beyond that, its fairly simple to debug certain StatsD or Graphite issues by looking at whats going on in realtime on the connection itself. On the StatsD host, be sure you’re looking at traffic across the default StatsD listen port (8125), and specifically here I’m grep’ing for the stat that I’m about to send which will be called test.goulah.myservice:

% sudo tcpdump -t -A -s0 dst port 8125 | grep goulah
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

Then we fake a simple client on the command line to send a sample statistic to StatsD like so:

echo "test.goulah.myservice:1|c" | nc -w 1 -u statsd.domain.com 8125

Back on the StatsD host, you can see the metric come through:

e......."A.test.goulah.myservice:1|c

There is also the line of communication from StatsD to the Graphite host. Every 10 seconds it flushes its metrics. Start up another tcpdump command, this time on port 2003, which is the port carbon is listening on the Graphite side:

% sudo tcpdump -t -A -s0 dst port 2003 | grep goulah
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

Every 10 seconds you should see a bunch of stats go by. This is what you are flushing into the Graphite backend. In our case I’m doing a grep for goulah, and showing the data aggregated for the metric we sent earlier. Notice there are two metrics here that look slightly different than the metric we sent though. StatsD sends two lines for every metric. The first is the aggregated metric prefixed with the stats namespace. StatsD also sends the raw data prefixed by stats_counts. This is the difference in the value per second calculated and the raw value. In our case they identical:

stats.test.goulah.myservice 0 1351355521
stats_counts.test.goulah.myservice 0 1351355521

Conclusion

Now we can get a better understanding of what StatsD is doing under the covers on our system. If metrics don’t show up on the Graphite side it helps to break things into digestible pieces to understand where the problem lies. If the metrics aren’t even getting to StatsD, then of course they can’t make it to Graphite. Or perhaps they are getting to StatsD but you are not seeing the metrics you would expect when you look at the graphs. This is a good start on digging into those types of problems.

Tags: , , , , , ,


Mar 15 2010

Node.js, Websockets, and the Twitter Gardenhose

Category: Real-time Webjgoulah @ 3:08 PM

Introduction

In this article I’m going to demonstrate how to use the Twitter Streaming API to send a stream of status updates for real time display in a web browser using Web Sockets. We’ll implement a backend server module that streams the events over a web socket using the node.websocket.js framework, which provides a simple realtime HTTP server that implements the web socket protocol. Node.websocket.js is built on top of Node.js which provides an Asynchronous API using callbacks. Node.js is a server side javascript library that enforces an event driven style of programming which allows you to develop non-blocking code easily. This in turn allows you to write simple servers that are very CPU and Memory efficient because you don’t have multiple threads taking up shared resources. Node.js is implemented using Google’s V8 javascript engine and also the CommonJS specification which is defining a standard library for server side javascript.

Getting Started

First things first, you need node.js, which you can get from github. Install in the usual way

$ git clone git://github.com/ry/node.git
$ cd node
$ ./configure
$ make
$ sudo make install

Then we’ll need node.websocket.js which you can also get from github

$ git clone git://github.com/Guille/node.websocket.js.git

This is basically an experimental implementation of the web socket API. You can create simple server side modules and then a client side implementation that opens up a web socket to the server in which you can then exchange data in both directions. There are a few examples included that you can take a look at including a simple echo server and a chat server. Just fire up the server like so

$ node runserver.js

and then open up one of the html files in the test/websocket/ directory in your browser. One catch though, you’ll need a current browser such as Google Chrome. I’d recommend it anyway, as browsing the web with it does run a bit smoother.

You’ll also need curl, which is pretty common on any linux box these days and can be installed through your package manager.

Setting Up the Server

The way we’re going to implement the server is to use the curl command to pull from the twitter stream into a file. Twitter gives you a bunch of JSON objects back which you can then parse and display. We’ll use this file in a moment to send the data over the web socket to the browser. There is some documentation on what you can do with the API but we’ll keep it simple and search for any tweets with ‘nyc’ in them

$ curl -dtrack=nyc http://stream.twitter.com/1/statuses/filter.json  -uUSERNAME:PASSWORD > sample.json

Now its time to write some server side javascript. In the node.websocket.js checkout there is a modules/ directory. We’ll create a new module called gardenhose.js. So the full path is node.websocket.js/modules/gardenhose.js

In a nutshell this is waiting for the client to establish a connection, and then it creates a child process that tails our file from the curl command above. Anytime the file is written to the “output” listener is invoked, which runs our callback to parse the JSON into objects that we can then use to send a string back to the client with some readable information from the stream.

Lets break it down just a bit more in case you are not familiar with Node. First we are requiring the system and filesystem modules from Node. Now, node.websocket.js basically just uses the node API to implement a server in the websocket.js file. It looks at the request header to see which module to instantiate and then invokes your onData() method when a client sends over data.

Therefore the onData method is the one we need to implement in our module above. We’ll use the process object in Node to create a child process that emits an event called “output” each time the child sends data to stdout. So the addListener call sets up a callback that will be invoked when our file receives more data from the twitter stream. That data comes in the form of JSON objects, one per line. So we split on the lines to create an array of JSON objects, and loop through them. Each time through the loop we’re sending this data back to the client, which is the web browser.

Just make sure the file you pass in is the correct path to the file you are outputting to from the curl command in the monitor_file() function. Then to run the node.websocket.js server you can just invoke it like so

$ node runserver.js

However if you are running the server from another host, you may want to listen on more than just the default of localhost

$ node runserver.js --host=0.0.0.0

Setting Up the Client

The goal here is to get the realtime tweets pumping through our web browser so our client will just be a web page with a little bit of web socket javascript.

This is probably a bit more straightforward. We’re just implementing a few of the functions from the Websocket interface. First we’re instantiating the WebSocket class with the hostname and port that we’re running the server on. The gardenhose in the path is to tell our server that we want to run the gardenhose.js module that we wrote above. The onopen() function is invoked when the socket is opened, and we send over the word “start” which if you recall from above understands the client has connected and to start the child process that runs tail on our file. The onmessage() function is invoked anytime the server is sending data over the web socket, which is the information we want to show on the page, so we append it to the HTML of our hose div. If the server closes the socket then onclose() is invoked, and we display that on the page.

Conclusion

We’ve written a server side module using Node.js and the node.websocket.js framework that will send tweets over a web socket connection. We have also looked at the WebSocket API and learned how to implement some of the functions defined by its interface. Of course, you could use JQuery or your favorite javascript libraries to enhance how this looks to the user, but the basics are all here in how the communication of a real time display can work with web sockets.

References

Async I/O – http://en.wikipedia.org/wiki/Asynchronous_I/O

CommonJS – http://www.commonjs.org/ and http://wiki.commonjs.org/wiki/CommonJS

V8 – http://code.google.com/p/v8

Node.JS – http://nodejs.org

Node.Websocket.JS – http://github.com/guille/node.websocket.js

Twitter Streaming API (Gardenhose) – http://apiwiki.twitter.com/Streaming-API-Documentation

Websocket API – http://dev.w3.org/html5/websockets

Tags: , , , , , , , , , , ,