Monthly Archives: March 2014
Apache Pig is part of the Hadoop ecosystem and is a procedural language which makes the job of processing data on Hadoop a lot easier than writing MapReduce jobs by hand. You can write a script in Pig Latin which, under the hood, will translate into MapReduce processes. This is often the quickest way of […]
I’ve got a Mac, I’d like to get Hadoop set up. What do I do? I started by Googling, of course. First, get a Linux Virtual Machine set up. I went for Ubuntu 12.04 LTS running on VirtualBox. Get a Linux installation running inside a virtual machine. This guide worked for me for the basic […]
The British Library are running an exhibition on data visualisation until 26th May 2014 – it looks amazing! Here’s The Guardian’s writeup.
I spend a lot of time at work trying to get people to make decisions based on data I generate. The business people I present to have a lot on their mind and anything more than a single page presentation is unlikely to get any attention. Which means that the single page you do present […]
So what is this Big Data stuff anyway? It seems to be everywhere these days. For those who have managed to completely avoid hearing about it, Wikipedia has its usual comprehensive, if slightly dry, description. Don’t worry if you glazed over after a couple of sentences; I did too. So here’s my take: it describes ways […]