Hadoop Online Training

Hadoop Online Training By IT Experts :

IQ online training facility offers Hadoop online training by trainers who have expert knowledge in the Hadoop and proven record of training hundreds of students. Our Hadoop training is regarded as the best online training by our students and corporate clients. We are training partners for corporate clients like IBM. We train students from across all countries like USA, UK, Singapore, UAE, Australia, India. Our Hadoop training is your one stop solution to Learn, Practice and build career in this field at the comfort of your Home with flexible class schedules.

Hadoop Introduction :

Hadoop is open source software which allows distributed processing of the scattered large sets of data across batch of computer servers using simple programming methods. It is outlined to scale up from a single server to thousands of machines, with a very high availability. This offers local computation and storage. Rather than depending on hardware, the flexibility of these batches comes from the softwares capability to detect and handle failures at the application layer. This course helps you through address the challenges and take advantage of the core values provided by Hadoop in a vendor neutral way.

IQ Training offers the Hadoop Online Course in a true global setting.

Hadoop Online Training Concepts :

Duration:70 Hours

Week  1:


1.Understanding BigData.

a.    What is Big Data?
b.    Big-Data characteristics.

2.Hadoop Distributions:

a.    Hortonworks

b.    Cloudera

c.    Pivotal HD

d.    Greenplum.

3.Introduction to Apache Hadoop.

a.    Flavors of Hadoop: Big-Insights, Google Query etc..

4.Hadoop Eco-system components: Introduction

a.    MapReduce

b.    HDFS

c.    Apache Pig

d.    Apache Hive

e.    HBASE

f.    Apache Oozie

g.    FLUME

h.    SQOOP

i.    Apache Mahout

j.    KIJI

k.    LUCENE

l.    SOLR

m.    KiteSDK.

n.    Impala

o.    Chukwa

p.    Shark

q.    Cascading.

Day 2:

1.Understanding Hadoop Cluster

2.Hadoop Core-Components.

a.    NameNode.

b.    JobTracker.

c.    TaskTracker.

d.    DataNode.

e.    SecondaryNameNode.

3.HDFS Architecture

a.    Why 64MB?

b.    Why Block?

c.    Why replication factor 3?

4.Discuss NameNode and DataNode.

5.Discuss JobTracker and TaskTracker.

Day 3:

1.Typical workflow of Hadoop application

2.Rack Awareness.

a.    Network Topology.

b.    Assignment of Blocks to Racks and Nodes.

c.    Block Reports

d.    Heart Beat

e.    Block Management Service.

3.Anatomy of File Write.

4.Anatomy of File Read.

5.Heart Beats and Block Reports

Day 4:

1.Discuss Secondary NameNode.

2.Usage of FsImage and Edits log.

Day 5:

1.Map Reduce Overview

2.Best Practices to setup Hadoop cluster

3.Cluster Configuration

a.    Core-default.xml

b.    Hdfs-default.xml

c.    Mapred-default.xml

d.    Hadoop-env.sh

e.    Slaves

f.    Masters

4.Need of *-site.xml

Week 2:

Day 1:

1.Map Reduce Framework

2.Why Map Reduce?

3.Use cases where Map Reduce is used.

4.Hello world program with Weather Use Case.

a.    Setup environment for the programs.

b.    Possible ways of writing Map Reduce program with sample codes find the best code and discuss.

c.    Configured, Tool, GenericOptionParser and queues usage.

d.    Demo for calculating maximum temperature and Minimum temperature.

5.Limitations of traditional way of solving word count with large dataset.

6.Map Reduce way of solving the problem.

Day 2:

1.Complete overview of MapReduce.

2.Split Size


Day 3:

1.Multi Reducers

2.Parts of Map Reduce


Day 4:

1.Apache Hadoop– Single Node Installation Demo

2.Namenode   – format.

Day 5:

1.Apache Hadoop – Multi Node Installation Demo

2.Add nodes dynamically to a cluster with Demo

3.Remove nodes dynamically to a cluster with Demo.

4.Safe Mode.

5.Hadoop cluster modes.

a.    Standalone Mode

b.    Psuedo distributed Mode

6.Fully distributed mode.

Week 3:

Day 1:


2.HDFS Practicals(HDFS Commands)

Day 2:

1.Map Reduce Anatomy

a.    Job Submission.

b.    Job Initialization.

c.    Task Assignments.

d.    Task Execution.



Day 3:

1.Map Reduce Failure Scenarios

2.Speculative Execution

3.Sequence File

4.Input File Formats

5.Output File Formats

6.Writable DataTypes

7.Custom Input Formats

8.Custom keys, Values usage of writables.

Day 4:

1.Walkthrough the installation process through the cloudera manager.

2.Example List, show sample example list for the installation.

3.Demo onteragen, wordcount, inverted index, examples….

4.Debugging Map Reduce Programs

Day 5:

1.Map Reduce Advance Concepts

2.Partitioning and Custom Partitioner


4.Multi outputs


6.MR unit testcases

7.MR Design patterns

8.Distributed Cache

a.    Command line implementation

9.MapReduce API implementation

Week 4:

Day 1:

1.Map Reduce Advance concepts examples

2.Introduction to course Project.

Day 2:

1.Data loadingtechniques.

a.    Hadoop Copy commands

i.Put,get,copyFromLocal,copyToLocal, mv,chmod,
rmr, rmr –skipTrash, distcp, ls,lsr,df,du,cp,
moveFromLocal, moveToLocal,text,touhz,tail,mkdir,help.

b.    Flume.

c.    Sqoop.

2.Demo for Hadoop Copy Commands

3.Sqoop Theory

4.Demo for Sqoop.

Day 3:

1.Need of Pig?

2.Why Pig Created?

3.Introduction to skew Join.

4.Why go for Pig when Map Reduce is there?

5.Pig use cases.

6.Pig built in operators

7.Pig store schema.

Day 4:


a.    Load

b.    Store

c.    Dump

d.    Filter.

e.    Distinct

f.    Group

g.    CoGroup

h.    Join

i.    Stream

j.    Foreach Generate

k.    Parallel.

l.    Distinct

m.    Limit

n.    ORDER

o.    CROSS

p.    UNION

q.    SPLIT

r.    Sampling

2.Dump Vs Store


a.    Complex

i.    Bag

b.    Primitives.

i.    Integers
v.    Double

4.Diagnostic Operators

a.    Describe

b.    Explain

c.    Illustrate


a.    Filter Function
b.     Eval Function
c.    Macros
d.    Demo

6.Storage Handlers.

Day 5:

1.Pig Practicals and Usecases.

2.Demo using schema.

3.Demo using without schema.

Week 5:

Day 1:

1.    Hive Background.

2.    What is Hive?

3.    Pig Vs Hive

4.    Where to Use Hive?

5.    Hive Architecture

6.    Metastore

7.    Hive execution modes.

8.    External, Manged, Native and Non-native tables.

Day 2:

1.Hive Partitions:

a.    Dynamic Partitions
b.    Static Partitions


3.Hive DataModel

4.Hive DataTypes

a.    Primitive
b.    Complex


a.    Create Managed Table

b.    Load Data

c.    Insert overwrite table

d.    Insert into Local directory.

e.    CTAS.

f.    Insert Overwrite table select.

Day 3:


a.    Inner Joins

b.    Outer Joins

c.    Skew Joins

2.Multi-table Inserts

3.Multiple files, directories, table inserts.






Day 4:

1.Hive Practical’s

Day 5:

1.Oozie Architecture

2.Workflow designing in Oozie


Week 6:

Day 1:

1.YARN Architecture

2.Hadoop Classic vs YARN

3.YARN Demo

Day 2:

1.Flume Architecture

2.Flume Practicals

3.Zoo Keeper

Day 3:

1.Introduction to NOSQL Databases.

2.NOSql Landscapes

3.Introduction to HBASE


5.Create Table on HBASE using HBASE shell

6.Where to use HBASE?

7.Where not to use HBASE?

8.Write Files to HBASE.

9.Major Components of HBASE.

a.    HBase Master.

b.    HRegionServer.

c.    HBase Client.

d.    Zookeeper.

e.    Region.

Day 4:


2.HBASE –ROOT- Catalog table

3.CAP Theorm



6.Sparse Datastore.

Day 5:

1.Cassandra Architecture

2.Big Table and Dynamo

3.Distributed Hash Table, P2P & Fault Tolerant

4.Data Modelling

5.Column Families

6.Installation Demo on Cassandra


Week 7:

Day 1:

1.Real time Project Analysis





6.Optimization Techniques

7.Which one to use where

Day 2:

1.Apache Storm

2.Use case with practicals

Day 3:

1.Amazon Web Services(Hadoop on Cloud) – Installations for MultiNode

2.EMR and S3.

Day 4:

1.Kafka Architecture and use case

Day 5:

1.Spark Architecture

2.Spark Practicals

3.Spark vs Hive vs Splunk

Week 7:

Day 1:

1.Impala Architecture

2.Impala Practicals

3.Adhoc Querying in Impala

Day 2:

1.Compression Techniques


2.Image processing in Hadoop

3.Scala with example

4.Certification Preparation Guidelines

Day 3:

1.Best Practices to setup Hadoop cluster

2.Commissioning and Decommissioning Nodes

3.Benchmarking the Hadoop cluster

4.Admin monitoring tools

5.Routine Admin tasks

Day 4:

1.Introduction and OverviewHortonworks Sandbox and single node installation.

Day 5:

1.Running Hadoop applications in HortonworksTez

2.Overview about Amabari.

Note: This includes the classes for  Unix basics and practicals, Core Java Theory and practical’s which will be around 10 hours.

Our Hadoop Online Training batches start every week and we accommodate your flexible timings.