Hadoop with Python

With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.

Tag(s): Big Data Python

Publication date: 19 Oct 2015

ISBN-10: n/a

ISBN-13: n/a

Paperback: 71 pages

Views: 58,934

Type: Book

Publisher: O’Reilly Media, Inc.

License: n/a

Post time: 21 Mar 2018 07:00:00

Hadoop with Python

Tag(s): Big Data Python
Publication date: 19 Oct 2015
ISBN-10: n/a
ISBN-13: n/a
Paperback: 71 pages
Views: 58,934
Document Type: Book
Publisher: O’Reilly Media, Inc.
License: n/a
Post time: 21 Mar 2018 07:00:00

Book Description from O'Reilly:

Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools.

View/Download this Book

About The Author(s)

Donald Miner (@donaldpminer) specializes in large-scale data analysis enterprise architecture and applying machine learning to real-world problems. He has architected and implemented dozens of mission-critical and large-scale data analysis systems within the U.S. Government and Fortune 500 companies. He has applied machine learning techniques to analyze data across several verticals, including financial, retail, telecommunications, healthcare, government intelligence, and entertainment.

Zachary Radtka (@zachradtka) is a platform engineer at the data science firm Miner & Kasch and has extensive experience creating custom analytics that run on petabyte-scale datasets. Zach is an experienced educator, having instructed collegiate-level computer science classes, professional training classes on Big Data technologies, and public technology tutorials. He has also created production-level analytics for many industries, including US government, financial, healthcare, telecommunications, and retail.

Book Categories

Computer Science

Introduction to Computer Science Introduction to Computer Programming Algorithms and Data Structures Artificial Intelligence Computer Vision Machine Learning Neural Networks Game Development and Multimedia Data Communication and Networks Coding Theory Computer Security Information Security Cryptography Information Theory Computer Organization and Architecture Operating Systems Image Processing Parallel Computing Concurrent Programming Relational Database Document-oriented Database Data Mining Big Data Data Science Digital Libraries Compiler Design and Construction Functional Programming Logic Programming Object Oriented Programming Formal Methods Software Engineering Agile Software Development Information Systems Geographic Information System (GIS)

Mathematics

Mathematics Algebra Abstract Algebra Linear Algebra Number Theory Numerical Methods Precalculus Calculus Differential Equations Category Theory Proofs Discrete Mathematics Theory of Computation Graph Theory Real Analysis Complex Analysis Probability Statistics Game Theory Queueing Theory Operations Research Computer Aided Mathematics

Supporting Fields

Web Design and Development Mobile App Design and Development System Administration Cloud Computing Electric Circuits Embedded System Signal Processing Integration and Automation Network Science Project Management

Operating System

Unix GNU/Linux FreeBSD iPhone iOS Android Windows

Programming/Scripting

Ada Assembly C / C++ Common Lisp Forth Java JavaScript Lua Rexx Microsoft .NET Perl PHP R Python Rebol Ruby Scheme Tcl/Tk

Miscellaneous