One of the most interesting talks I attended at Hadoop World was by Revolution Analytics. A couple of months ago they released a set of open-source packages to Github which marries their ‘R’ statistical programming language to the power of Hadoop. As well as running R programs across Map/Reduce it includes support to work with datasets stored in HDFS and HBase.
Today I had the pleasure of installing it on our ‘petunia’ cluster. I grabbed the ‘Revolution R Community‘ edition from the website and set about installing it. They provide a RHEL-friendly download, I’m on CentOS 6 so I didn’t foresee any big issues.
The good news is that CentOS 6 has most of the dependencies needed, to be exact the following were all fine:
However all was not well, the libraries ‘
tk‘ & ‘
libicu‘ weren’t happy. The latest versions available from CentOS 6 were just too revolutionary and the installer stopped dead in its tracks.
1. Firstly remove the current (CentOS 6) versions:
$ sudo yum erase tcl tcl-devel tk tk-devel libicu libicu-devel
2. Prevent yum from updating them by editing
/etc/yum.repos.d/CentOS-Base.repo and adding the line to both the
[updates] sections of the file:
exclude=libicu libicu-devel tcl tcl-devel tk tk-devel
3. Download the RPM’s for those packages direct from your nearest CentOS 5 mirror.
4. Install these RPM’s manually…
5. Re-run the installer, and all should be well!