Mastering Hadoop Data Science on Azure

Apache Hadoop is an open-source framework for extracting information from massively large datasets using the MapReduce programming model. It is capable of distributing workloads across multiple nodes in a cluster for fast parallel processing, and it uses the Hadoop Distributed File System (HDFS) to provide high-aggregate bandwidth to machines comprising the cluster. In this series, Frank La Vigne takes a deep dive into Hadoop and the Hadoop ecosystem and demonstrates how to run these tools locally or in Azure HDInsight clusters to make short work of big data.