Month: March 2011
-
Consuming Twitter streams from Java
A while ago I was playing with the Twitter Streaming API, one of the first things I wanted to do was collect a lot of data for off-line analysis (in Hadoop). I wrote a hacky little utility class called TwitterConsumer.java that did just the trick. Basically you just initialise it with a valid Twitter account…
-
Supermicro AOC-SASLP-MV8: DRIVER_TIMEOUT
I have been using the Supermicro AOC-SASLP-MV8 host bus adapter on quite a few Linux machines recently. Supermicro/Marvell only provide stable drivers for Windows and a select few (outdated) Linux distributions. I had to rely on the open-source support in the drivers/scsi/mvsas tree of the Linux kernel. Running this card with 8 x 2TB hard…
-
Reading ZIP files from Hadoop Map/Reduce
This post has been obsoleted by my update here: Hadoop: Processing ZIP files in Map/Reduce One of the first use-cases I had for playing with Apache Hadoop involved extracting and parsing the contents of thousands of ZIP files. Hadoop doesn’t have a built-in reader for ZIP files, it just sees them as binary blobs. To solve…
-
Earthquake Data
With today’s events unfolding in Japan I went looking for sources of earthquake data when I stumbled upon the GEOFON Extended Virtual Network and after a quick bit of scripting I collated all the data into a simple CSV format making it easier to analyse. I have published the collated data on a new page…