Mastering Hadoop Data Science on Azure

Apache Hadoop is an open-source framework for extracting information from massively large datasets using the MapReduce programming model. It is capable of distributing workloads across multiple nodes in a cluster for fast parallel processing, and it uses the Hadoop Distributed File System (HDFS) to provide high-aggregate bandwidth to machines comprising the cluster. In this series, Frank La Vigne takes a deep dive into Hadoop and the Hadoop ecosystem and demonstrates how to run these tools locally or in Azure HDInsight clusters to make short work of big data.

Course Title Author Duration Topic(s)
Introducing Hadoop Frank La Vigne 00:23:27 Data Science, Hadoop, Azure, Big Data
Processing Big Data with MapReduce Frank La Vigne 01:07:23 Data Science, Azure, Hadoop, Big Data
Using Hive to Query Hadoop Frank La Vigne 00:57:31 Data Science, Azure, Hadoop, Big Data, Hive
Using Pig with Hadoop Frank La Vigne 00:50:13 Data Science, Pig, Hadoop, Big Data
Using HBase Frank La Vigne 00:50:35 Data Science, HBase, Hadoop, Big Data