Hadoop with Python
With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.
Publication date: 19 Oct 2015
ISBN-10: n/a
ISBN-13: n/a
Paperback: 71 pages
Views: 55,704
Hadoop with Python
Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools.
About The Author(s)
Donald Miner (@donaldpminer) specializes in large-scale data analysis enterprise architecture and applying machine learning to real-world problems. He has architected and implemented dozens of mission-critical and large-scale data analysis systems within the U.S. Government and Fortune 500 companies. He has applied machine learning techniques to analyze data across several verticals, including financial, retail, telecommunications, healthcare, government intelligence, and entertainment.
Donald Miner (@donaldpminer) specializes in large-scale data analysis enterprise architecture and applying machine learning to real-world problems. He has architected and implemented dozens of mission-critical and large-scale data analysis systems within the U.S. Government and Fortune 500 companies. He has applied machine learning techniques to analyze data across several verticals, including financial, retail, telecommunications, healthcare, government intelligence, and entertainment.
Zachary Radtka (@zachradtka) is a platform engineer at the data science firm Miner & Kasch and has extensive experience creating custom analytics that run on petabyte-scale datasets. Zach is an experienced educator, having instructed collegiate-level computer science classes, professional training classes on Big Data technologies, and public technology tutorials. He has also created production-level analytics for many industries, including US government, financial, healthcare, telecommunications, and retail.
Zachary Radtka (@zachradtka) is a platform engineer at the data science firm Miner & Kasch and has extensive experience creating custom analytics that run on petabyte-scale datasets. Zach is an experienced educator, having instructed collegiate-level computer science classes, professional training classes on Big Data technologies, and public technology tutorials. He has also created production-level analytics for many industries, including US government, financial, healthcare, telecommunications, and retail.