So what is this Big Data stuff anyway? It seems to be everywhere these days. For those who have managed to completely avoid hearing about it, Wikipedia has its usual comprehensive, if slightly dry, description. Don’t worry if you glazed over after a couple of sentences; I did too. So here’s my take: it describes ways of managing and analysing huge amounts of data. Such as the data which Google or Facebook has to analyse. Or the data generated by Twitter (here is a lovely realtime map of Twitter feeds). The volume of data is growing at an exponential rate – it is estimated that 90% of the world’s data was generated in the last two years. And that will still be true in two year’s time.
Such data is very valuable – not only to search engines like Google but also to shops hoping to target promotions more effectively and of course improving our health.
To be able to make sense of all this data, new technologies have been developed. A very common paradigm is MapReduce, developed by Google in 2004. This has since been released as Open Source and has been turned into implementations such as Apache Hadoop.
In this blog I’m going to explore Big Data – not just by citing articles as I have done above, but try and get something actually running. The first step is to learn the technology, next step is to find a project so I can get my teeth stuck in. If you have any suggestions please let me know (I have one in mind but need to see how viable it is first).