Courses Offered: SCJP SCWCD Design patterns EJB CORE JAVA AJAX Adv. Java XML STRUTS Web services SPRING HIBERNATE  
     

HADOOP

Syllabus

1. Module 1 – Big Data, Hadoop, Introduction to Hadoop Architecture and HDFS
-----------------------------------------
Big Data boom!!
Where traditional systems lacking?
difference between Hadoop architecture with traditonal architecture
Main components of Hadoop
HDFS in detail
NameNode, DataNode, Secondary Node
JobTracker, TaskTracker
Anatomy of Read and Write data on HDFS

Module 2 – Hadoop 2.0, YARN, MRv2 & Hadoop 1.0 Limitations
-----------------------------------------
MapReduce Limitations
History of Hadoop 2.0
HDFS 2: Architecture
HDFS 2: High availability
HDFS 2: Federation
YARN Architecture
Classic vs YARN

Module 3 – Understanding Hadoop MapReduce Framework
-----------------------------------------
Overview of the MapReduce Framework
Use cases of MapReduce
MapReduce Architecture
Understand the concept of Mappers, Reducers
Anatomy of MapReduce Program
MapReduce Components – Mapper Class, Reducer Class, Driver code
Splits and Blocks
Understand Combiner and Partitioner

Module 4 – Advance MapReduce – Part 1
---------------------------------------
Write your own Partitioner
Writing Map and Reduce in Python
Map Side Join
Distributed Join
Reduce Side Join
Counters
Joining Multiple datasets in MapReduce

Module 5 – Advance MapReduce – Part 2
-------------------------------------
MapReduce internals
Understanding Input Format
Custom Input Format
MapReduce API
Hadoop Data Types
Using Writable and Writable comparable
Understanding Output Format
Sequence Files
Using Parquet Format
Using Avro Format

Module 6 – Apache Hive and HiveQL
------------------------------------
What is Hive
-------------
Hive DDL – Create/Show/Drop Database
Hive DDL – Create/Show/Drop Tables
Hive DML – Load Files into Tables
Hive DML – Inserting Data into Tables
Hive SQL – Select, Filter, Join, Group By
Hive Architecture & Components
Hive Data Model and Data Units
Difference between Hive and RDBMS
Module 7 – Advance HiveQL
-----------------------------------------

Multi-Table Inserts
Joins
Grouping Sets, Cubes, Rollups
Custom Map and Reduce scripts
Hive UDF
Partitioning and Bucketing

Module 8 – Apache Pig
----------------------

PIG vs MapReduce
PIG components
PIG execution
PIG Data types
PIG Architecture
PIG Latin Relational Operators
PIG Latin Join and CoGroup
PIG Latin Group and Union
Describe, Explain, Illustrate
PIG Latin: File Loaders
PIG Latin: Creating UDF
Module 9 – Apache Flume, Apache Sqoop, Apache Oozie
-----------------------------------------

Sqoop & Flume– How Sqoop works
--------------------------
Import/Export Data
Sqoop Architecture
Flume – How it works
Flume - Collecting Logs
Flume - Working with twitter stream
Oozie:
------
Oozie – Simple/Complex Flow
Oozie – Components
Oozie Service/ Scheduler
Example Workflow
Use Cases – Time and Data triggers
Running/Debuggin a Coordinator Job

Module 10 – NoSQL Databases
---------------------------
Introduction to NoSQL
CAP theorem RDBMS vs NoSQL
Analytical (OLAP)
Overview of NoSQL DBs
Key Value stores: Redis, Dynamo DB
Column Family: Cassandra, HBase
Graph Store: Neo4J
Document Store: CouchBaseMongoDB

Module 11 – Apache HBase
--------------------------
When/Why to use HBase
HBase Architecture/Storage
HBase Features
HBase Data Model
HBase Families
HBase Master
HBase vs RDBMS
Column Families
Access HBase Data
HBase API
Runtime modes
Running HBase

Module 12 – Apache Zookeeper
----------------------------
What is Zookeeper?
Who is using it
Installing and Configuring
Running Zookeeper
Zookeeper use cases

Module 13 - Apache Spark in Depth
--------------------------
Overview of Lambda Architecture
Spark Streaming
Spark SQL
Spark Processing

Module 14 - Apache Kafka
----------------------------------------
Architecture of Kafka
Installation
Kafka Operations
Producer and Consumer API

Module 15 - Apache Storm
-------------------------------------
Architecture of Storm
Components and Topology in Storm
Understand Spouts and bolts
Twitter Streaming

Module 16 - Hadoop Admin Overview
Module 17 - Hadoop Distributions
----------------------------------------------
Cloudera - fisrt enterprise commercial distribution
Hortonworks - package..
Pivotal
MapR - MapRHDFS