Mastering Spark Data Science on Azure

Apache Spark is a fast, in-memory data-processing engine with elegant and expressive development APIs that enable data scientists to execute streaming workloads, build sophisticated machine-learning models, and perform other tasks endemic to extracting information from large datasets. Apache Spark for Azure HDInsight makes high-performance Spark clusters available to the masses, and Azure's Data Science Virtual Machine is perfect for learning Spark and other tools such as Jupyter and Microsoft R Server. In this landmark series, Microsoft data scientist Mark Tabladillo takes a deep dive into Apache Spark and the Spark ecosystem and demonstrates how to use the Spark support in Azure to make short work of big-data workloads and build sophisticated machine-learning models.