A few days ago I read Time magazine’s main article “Can Google Solve Death?” which my father-in-law brought. I found the article name intriguing. Although after reading thorough the article, it is about Google investing in a new biotechnology company Calico which aims in extending human’s health and life. Lets face it, having a long life but in poor health is no fun.
Google is not explaining the red dots on why a tech company would just jump to a biotech development. But some analysts speculate it is aiming to harness its big data processing capabilities to support further discoveries. It sorts of make sense or maybe Larry Page is just paranoid with his own mortality.
But it does bring the next question, is big data the next big thing which techies need to follow? Such as Java a decade ago? Lets have a look at it again.
Big data arise because the world is more and more connected now, and there are more tools capturing the sources of information in sciences. Looking into the future, more and more data will be captured. As of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created.
According to Wikipedia, Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The size of the data itself is not important, it can be several Gigabytes to Zillion of petabytes. The failure of traditional data processing to efficiently process them is the defining point. Due to improvements in the current data processing technologies, the size of the data has been a running target, some data which were unable to be processed efficiently can be processed efficiently.
But real world problems cannot wait until traditional data processing technologies ripen. That’s where big data technologies jump in. The hope is with such large data, government and organisations can make a better decisions. Right now big companies and western government, are racing to profit from big data analytics, which is estimated at around 5.1 Billion dollars for 2011 and grows at around 10% annually. .
Gaartner has compiled the top 10 technology trends in which big data is mentioned as part of implementing hybrid IT and cloud architecture. Where data can come in multiple forms and source and not just from a single data warehouse.
Hadoop is based on the approach develop by Google to manage internet indexing, there were no product available for them then, so they had to make it themselves. A huge cluster of commodity servers is managed by the framework, each has some copy of the data, and they do not share any memory or disk. If a server went down, it would not be an issue since there are several copies of the data. Hadoop maps the operation out to all of those servers, they run in parallel and then it will reduce the results back into a single result set, this is called MapReduce.Now you can even create the cluster in the cloud.
Unfortunately for the ordinary techies it will be difficult to self learn on how to efficiently manage big data. Because if you try to install a framework like Hadoop to your desktop or laptop, the performance will be poor, you can’t really see it perform, it will only shine if you crunch it huge data, and the approach of working with Hadoop is still rudimentary. If you are lucky try to get yourself into a large project which concerns big data. The good news is they are getting better, and if you can get your hands into it, you can be one of the player in this uncharted territory. If things where simple and unchallenging, you won’t get much rewards.