• What is Big Data?
• What is Hadoop?
• Relation between Big Data and Hadoop.
• What is the need of going ahead with Hadoop?
• Scenarios to apt Hadoop Technology in REAL TIME Projects
• Challenges with Big Data
»Storage
» Processing
• How Hadoop is addressing Big Data Changes
• Comparison with Other Technologies
» RDBMS
» Data Warehouse
» TeraData
• Different Components of Hadoop Echo System
» Storage Components
» Processing Components
HDFS (Hadoop Distributed File System)
• What is a Cluster Environment?
• Cluster Vs Hadoop Cluster.
• Significance of HDFS in Hadoop
• Features of HDFS
• Storage aspects of HDFS
• HDFS Architecture 5 demons of Hadoop
» Block
» How to Configure block size
» Default Vs configurable block size
» Why HDFS block size is so large
» Design Principles of block size
» NameNode and its functionality
» Data node and its functionality
» Job Tracker and its functionality
MapReduce
• Why Map Reduce is essential in Hadoop?
• Processing Daemons of Hadoop
• Input Split
• Map Reduce Life Cycle
• MapRedue Programming Model
» Job Tracker
> Roles Of Job Tracker
> Drawbacks w.r.to Job Tracker failure in Hadoop Cluster
> How to configure Job Tracker in Hadoop Cluster
» Task Tracker
> Roles of Task Tracker
> Drawbacks w.r.to Task Tracker Failure in Hadoop Cluster
» InputSplit
» Need Of Input Split in Map Reduce
» InputSplit Size
» InputSplit Size Vs Block Size
»InputSplit Vs Mappers
» Communication Mechanism of Job Tracker and Task Tracker
» Input Format Class
» Record Reader Class
» Success Case Scenarios
» Failure Case Scenario
» Retry Mechanism in Map Reduce
» Different phases of Map Reduce Algorithm
» Different Data types in Map Reduce
> Primitive Data types Vs Map Reduce Data types
» How to write basic Map Reduce Program
> Driver Code
> Mapper Code
> Reducer Code
»Driver Code
> Importance of Driver Code in a Map Reduce program
> How to Identify the Driver Code in Map Reduce program
> Different sections of Driver code
» Mapper Code
> Importance of Mapper Phase in Map Reduce
> How to Write a Mapper Class?
> Methods in Mapper Class
» Reducer Code
> Importance of Reduce phase in Map Reduce
> How to Write Reducer Class?
> Methods in Reducer Class
» IDENTITY MAPPER & IDENTITY REDUCER
» Input Format's in Map Reduce
> TextInputFormat
> KeyValueTextInputFormat
> NLinelnputFormat
> DBlnputFormat
> SequenceFilelnputFormat
> How to use the specific input format in Map Reduce
» Output Format's in Map Reduce
> TextOutputFormat
> KeyValueTextOutputFormat
> NLineOutputFormat
> DBOutputFormat
> SequenceFileOutputFormat
> How to use the specific Output format in Map reduce
» Map Reduce API(Application Programming
> New API
> Deprecated API
» Combiner in Map Reduce
> Is combiner mandate in Map Reduce
> How to use the combiner class in Map Reduce
> Performance tradeoffs w.r.to Combiner
» Partitioner in Map Reduce
> Importance of Pratitioner class in Map Reduce
> How to use the Partitioner class in Map Reduce
> hashPartitioner functionality
> How to write a custom Partitioner
» Compression techniques in Map Reduce
> Importance of Compression in Map Reduce
> What is CODEC
> Compression Types
> GzipCodec
> BzipCodec
> LZOCodec
> SnappyCodec
> Configurations w.r.to Compression Techniques
> How to customize the Compression per one job Vs all the job
» Joins - in Map Reduce
> Map Side Join
> Reduce Side Join
> Performance Trade Off
> Distributed cache
» How to debug MapReduce Jobs in Local and Pseudo cluster Mode.
» Introduction to MapReduce Streaming
» Data localization in Map Reduce
» Task Track and its functionality
» Secondary Name Node and its functionality
• Replication in Hadoop - Fail Over Mechanism
• Accessing HDFS
» Data Storage in Data Nodes
» Fail Over Mechanism in Hadoop - Replication
» Replication Configuration
» Custom Replication
» Design Constraints with Replication Factor
» CLI (Command Line Interface) and HDFS Commands
» Java Based Approach