Myself and Thilina has been working on a Book on Hadoop, and it is now online! You can find the book from http://www.packtpub.com/hadoop-mapreduce-cookbook/book or from Amazon at http://www.amazon.com/Hadoop-MapReduce-Cookbook-ebook/dp/B00B71KZRE/. It is available under both paperback as well as in the e-book format.
Hadoop is an implementation of the MapReduce pattern first introduced by Goolge in their seminal paper MapReduce: Simplified Data Processing on Large Clusters . It provide a programming model for users to process large dataset using many computers.
For example, let us consider there are few giga bytes of log files that contains access logs for a server. If you want to read those log files and count the number of hits received by each web page in the server. It is possible for the user to write a program that walks though the log file and process them. However, if the log files are large, users would need to process the log files using many computers. Writing a system that process such log files using many computers would be a significant undertaking.
However, toolkits like Hadoop that support MapReduce framework would let users to write two functions called "map" and "reduce", and the framework will take care of the details of processing the log files. Furthermore, Hadoop will handle details like communication between nodes, scheduling sub-tasks, handling failures, and debugging.
Log processing is only a trivial examples that can be implemented with MapReduce paradigm. It can and it is being used to implement many simple and complex data processing tasks around the world. Users can extend in many ways to handle different message formats.
The book starts with simple introductory level details, but goes into many map reduce patterns like analytics, clustering, and recommendations etc. Each one is explained using a recipe and accompanied with code samples. We believe it has recipe that would help beginners as well as experienced MapReduce developers.
Some important information.
- Book uses Hadoop 1.0.x releases
- Table of content is at http://www.amazon.com/Hadoop-MapReduce-Cookbook-ebook/dp/B00B71KZRE/ (click on the image)
- The sample chapter that talks about Analytics using MapReduce from https://www.packtpub.com/sites/default/files/9781849517287_Chapter_06.pdf
This finishes about a year long process of writing editing, and re-editing. It took lot of time, but it was of course a great experience. I would like to thank the editorial team for all the help and feedback. We hope the book will be be useful for Hadoop developers.