Big Data with Hadoop

Course Objectives

By the end of this training you will:
– Understand the Types of Tools in Big data. Architectural and functional view of Hadoop.
– Be able to apply the knowledge learned to progress in your career as Big Data Developer/ Consultant.

Prerequisites

This course requires a basic knowledge in Linux/ Unix Bash commands or any programming languages like Java/ Python. How ever we explain linux basic commands so poeple with nill knowledge in big data can also learn this course with out hurdles.

Duration

3 Days

Course Preview

Introduction to Big Data

• What is Big data
• How is it Evolved
• Four Dimensions (Four V's of big data)
• Use cases of big data
• Different Tools to process big data

Introduction to Hadoop

• What is Hadoop?
• Components of Hadoop eco system.
• Why Hadoop?
• Industrial usage of Hadoop Eco systems.
• Installation and configuration of Hadoop.
• Types of Hadoop platforms.

HDFS( Hadoop Distributive Fie system)

• HDFS Introduction
• HDFS layout
• Importance of HDFS in Hadoop
• HDFS Features
• Storage aspects of HDFS
• Blocks in Hadoop
• Configuring block size
• Difference between Default and Configurable Block size
• Design Principles of Block Size
• HDFS Architecture
• HDFS Daemons and its Functionalities
• NameNode
• Secondary Name Node
• DataNode
• HDFS Use cases
•More detailed explanation about Configuration files.
•Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.

Map Reduce

• What is Map Reduce?
• Map Reduce Use cases?
• Map Reducing Functionalities
• Importance of Map Reduce in Hadoop?
• Processing Daemons of Hadoop
» Job Tracker
» Task Tracker
• Input Split
» Role of Input Split in Map Reduce
» InputSplit Size Vs Block Size
» InputSplit Vs Mappers
• How to write a basic Map Reduce Program
» Driver Code
» Mapper Code
» Reducer Code
• Driver Code
- Importance of Driver Code in a Map Reduce program
- How to Identify the Driver Code in Map Reduce program
- Different sections of Driver code
• Mapper Code
- Importance of Mapper Phase in Map Reduce
- How to Write a Mapper Class?
- Methods in Mapper Class
• Reducer Code
- Importance of Reduce phase in Map Reduce
- How to Write Reducer Class?
- Methods in Reducer Class
•Input and output Format's in Map Reduce
• Map Reduce API(Application Programming Interface)
- New API
- Depreciated API
• Combiner in Map Reduce
- Importance of combiner in Map Reduce
- How to use the combiner class in Map Reduce?
- Performance tradeoffs with respects to Combiner
• Partitioner in Map Reduce
- Importance of Partitioner class in Map Reduce
- How to use the Partitioner class in Map Reduce
- hash Partitioner functionality
- How to write a custom Partitioner
• Joins - in Map Reduce
- Map Side Join
- Reduce Side Join
- Performance Trade Off
• How to debug MapReduce Jobs in Local and Pseudo cluster Mode.
• Introduction to MapReduce Streaming
• Data localization in Map Reduce
• Secondary Sorting Using Map Reduce
• Job Scheduling

Apache Pig

• Introduction to Pig
• Basic commands in Pig
• Installation
• Use cases
• Architecture and functionality

Hive

• Introduction to Hive/Hiveql
• Installation of Hive
• Difference between Hive and SQL
• Hive Architecture and Use cases
• Explanation of Data Types in Hive

Sqoop

• Introduction to Sqoop
• Installation
• Basic Commands in Sqoop
• Usage of Sqoop in Data Tranfer
• Sqoop Functionality and Architecture
• Sqoop Export and Import Queries

Different Types of File systems

• Introduction to different file systems in Big data
• Use cases
• Types of Data in Real time
• File structures and Size of Files

Brief Description of Big data Tools

• Introduction and use cases for below tools
• Kafka
• Flume
• YARN
• OOZIE

Big Data with Hadoop

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Course Objectives

Prerequisites

Duration

Course Preview

Introduction to Big Data

Introduction to Hadoop

HDFS( Hadoop Distributive Fie system)

Map Reduce

Apache Pig

Hive

Sqoop

Different Types of File systems

Brief Description of Big data Tools