Tag: HDFS

  • Hadoop: Processing ZIP files in Map/Reduce

    by

    in

    Due to popular request, I’ve updated my simple framework for processing ZIP files in Hadoop Map/Reduce jobs. Previously the only easy solution was to unzip files locally and then upload them to the Hadoop Distributed File System (HDFS) for processing. This adds a lot of unnecessary complexity when you are dealing with thousands of ZIP files; Java…