Month: March 2011

  • Consuming Twitter streams from Java

    A while ago I was playing with the Twitter Streaming API, one of the first things I wanted to do was collect a lot of data for off-line analysis (in Hadoop).  I wrote a hacky little utility class called TwitterConsumer.java that did just the trick. Basically you just initialise it with a valid Twitter account…

  • Supermicro AOC-SASLP-MV8: DRIVER_TIMEOUT

    I have been using the Supermicro AOC-SASLP-MV8 host bus adapter on quite a few Linux machines recently.  Supermicro/Marvell only provide stable drivers for Windows and a select few (outdated) Linux distributions.  I had to rely on the open-source support in the drivers/scsi/mvsas tree of the Linux kernel. Running this card with 8 x 2TB hard…

  • Reading ZIP files from Hadoop Map/Reduce

    by

    in ,

    This post has been obsoleted by my update here: Hadoop: Processing ZIP files in Map/Reduce One of the first use-cases I had for playing with Apache Hadoop involved extracting and parsing the contents of thousands of ZIP files.  Hadoop doesn’t have a built-in reader for ZIP files, it just sees them as binary blobs. To solve…

  • Earthquake Data

    With today’s events unfolding in Japan I went looking for sources of earthquake data when I stumbled upon the GEOFON Extended Virtual Network and after a quick bit of scripting I collated all the data into a simple CSV format making it easier to analyse. I have published the collated data on a new page…