I’ve got a Mac, I’d like to get Hadoop set up. What do I do? I started by Googling, of course.
First, get a Linux Virtual Machine set up. I went for Ubuntu 12.04 LTS running on VirtualBox.
- Get a Linux installation running inside a virtual machine. This guide worked for me for the basic stuff.
- Install VirtualBox Guest Additions. These allow you to resize your Linux screen past the default 1024 x 768. Most instructions for doing this assume you are hosting under Windows. If under a Mac, go to the Guest Additions from the VirtualBox download page and then to the VirtualBox version you downloaded, and then find the VBoxGuestAdditions_X.X.X.iso file and download it.
- Mount the ISO image as a CD device on your Linux installation, then go to your Linux machine and double-click on the CD to auto-run the install. Then reboot your machine.
- You should now have a clean Linux installation. In case it gets messed up, clone it.
Next, install Hadoop. I went for 2.2.0.
- This guide almost worked. Complete everything down to the Format Namenode` section. When you get there, follow step (2) below before continuing with the rest of the guide.
- Formatting didn’t work for me. A bit of googling revealed I had to add the following into .bashrc
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
And that should be it! I got the word count examples running and everything looks OK.
Since then I have been doing a bit more reading and there are lots of places which aim to make this process a lot easier, such as Cloudera and Hortonworks. They bundle together Hadoop and a bunch of Hadoop-related utilities such as Sqoop and Oozie into one slick package.
I will be investigating those next – look out for the next blog post.