Tag: Apache Hadoop

  • The problem with Big Data is not the Data

    There is a seemingly irrational obsession about how BIG your Big Data has to be before a magical unicorn appears and delivers the answers your business needs. Not a day goes by where I don’t see some swanky infographic reminding me that Facebook collects several Yottabytes of data every day. Ok, so I may have embellished that a…

  • It’s not how big your data is, it’s how you use it!

    by

    in

    Over the past couple of months I have met and talked to a lot of new and interesting people. Everywhere I go I encounter the same questions about Big Data, it’s like some sort of mass hysteria around what on the face of it is a simple concept “volumes of data”. Example questions; “How much…

  • Hadoop: Processing ZIP files in Map/Reduce

    by

    in

    Due to popular request, I’ve updated my simple framework for processing ZIP files in Hadoop Map/Reduce jobs. Previously the only easy solution was to unzip files locally and then upload them to the Hadoop Distributed File System (HDFS) for processing. This adds a lot of unnecessary complexity when you are dealing with thousands of ZIP files; Java…