Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

By the end of this training you will:
– Understand the Types of Tools in Big data. Architectural and functional view of Hadoop.
– Be able to apply the knowledge learned to progress in your career as Big Data Developer/ Consultant.

This course requires a basic knowledge in Linux/ Unix Bash commands or any programming languages like Java/ Python. How ever we explain linux basic commands so poeple with nill knowledge in big data can also learn this course with out hurdles.

3 Days

Introduction to Big Data

• What is Big data
• How is it Evolved
• Four Dimensions (Four V's of big data)
• Use cases of big data
• Different Tools to process big data

Introduction to Hadoop

• What is Hadoop?
• Components of Hadoop eco system.
• Why Hadoop?
• Industrial usage of Hadoop Eco systems.
• Installation and configuration of Hadoop.
• Types of Hadoop platforms.

HDFS( Hadoop Distributive Fie system)

• HDFS Introduction
• HDFS layout
• Importance of HDFS in Hadoop
• HDFS Features
• Storage aspects of HDFS
• Blocks in Hadoop
• Configuring block size
• Difference between Default and Configurable Block size
• Design Principles of Block Size
• HDFS Architecture
• HDFS Daemons and its Functionalities
• NameNode
• Secondary Name Node
• DataNode
• HDFS Use cases
•More detailed explanation about Configuration files.
•Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.

Map Reduce

• What is Map Reduce?
• Map Reduce Use cases?
• Map Reducing Functionalities
• Importance of Map Reduce in Hadoop?
• Processing Daemons of Hadoop
» Job Tracker
» Task Tracker
• Input Split
» Role of Input Split in Map Reduce
» InputSplit Size Vs Block Size
» InputSplit Vs Mappers
• How to write a basic Map Reduce Program
» Driver Code
» Mapper Code
» Reducer Code
• Driver Code
- Importance of Driver Code in a Map Reduce program
- How to Identify the Driver Code in Map Reduce program
- Different sections of Driver code
• Mapper Code
- Importance of Mapper Phase in Map Reduce
- How to Write a Mapper Class?
- Methods in Mapper Class
• Reducer Code
- Importance of Reduce phase in Map Reduce
- How to Write Reducer Class?
- Methods in Reducer Class
•Input and output Format's in Map Reduce
• Map Reduce API(Application Programming Interface)
- New API
- Depreciated API
• Combiner in Map Reduce
- Importance of combiner in Map Reduce
- How to use the combiner class in Map Reduce?
- Performance tradeoffs with respects to Combiner
• Partitioner in Map Reduce
- Importance of Partitioner class in Map Reduce
- How to use the Partitioner class in Map Reduce
- hash Partitioner functionality
- How to write a custom Partitioner
• Joins - in Map Reduce
- Map Side Join
- Reduce Side Join
- Performance Trade Off
• How to debug MapReduce Jobs in Local and Pseudo cluster Mode.
• Introduction to MapReduce Streaming
• Data localization in Map Reduce
• Secondary Sorting Using Map Reduce
• Job Scheduling

Apache Pig

• Introduction to Pig
• Basic commands in Pig
• Installation
• Use cases
• Architecture and functionality


• Introduction to Hive/Hiveql
• Installation of Hive
• Difference between Hive and SQL
• Hive Architecture and Use cases
• Explanation of Data Types in Hive


• Introduction to Sqoop
• Installation
• Basic Commands in Sqoop
• Usage of Sqoop in Data Tranfer
• Sqoop Functionality and Architecture
• Sqoop Export and Import Queries

Different Types of File systems

• Introduction to different file systems in Big data
• Use cases
• Types of Data in Real time
• File structures and Size of Files

Brief Description of Big data Tools

• Introduction and use cases for below tools
• Kafka
• Flume