Hadoop Online Training

Hadoop Online Training by united trainings is a well detailed course content prepared by our expert Hadoop training Professionals.

Demo Session:-

Course Content – Hadoop online course

Introduction to Big Data and Hadoop

  • What is Big Data?
  • What are the challenges for processing big data?
  • What technologies support big data?
  • 3V’s of BigData and Growing.
  • What is Hadoop?
  • Why Hadoop and its Use cases
  • History of Hadoop
  • Different Ecosystems of Hadoop.
  • Advantages and Disadvantages of Hadoop
  • Real Life Use Cases

HDFS (Hadoop Distributed File System)

  • HDFS architecture
  • Features of HDFS
  • Where does it fit and Where doesn’t fit?
  • HDFS daemons and its functionalities
  • Name Node and its functionality
  • Data Node and its functionality
  • Secondary Name Node and its functionality
  • Data Storage in HDFS
  • Introduction about Blocks
  • Data replication
  • Accessing HDFS
  • CLI(Command Line Interface) and admin commands
  • Java Based Approach
  • Hadoop Administration
  • Hadoop Configuration Files
  • Configuring Hadoop Domains
  • Precedence of Hadoop Configuration
  • Diving into Hadoop Configuration
  • Scheduler
  • RackAwareness
  • Cluster Administration Utilities
  • Rebalancing HDFS DATA
  • Copy Large amount of data from HDFS
  • FSImage and Edit.log file theoretically and practically.

MAPREDUCE

  • Map Reduce architecture
    • JobTracker , TaskTracker and its functionality
    • Job execution flow
    • Configuring development environment using Eclipse
    • Map Reduce Programming Model
    • How to write a basic Map Reduce jobs
    • Running the Map Reduce jobs in local mode and distributed mode
    • Different Data types in Map Reduce
    • How to use Input Formatters and Output Formatters in Map Reduce Jobs
    • Input formatters and its associated Record Readers with examples
    • Text Input Formatter
    • Key Value Text Input Formatter
    • Sequence File Input Formatter
    • How to write custom Input Formatters and its Record Readers
    • Output formatters and its associated Record Writers with examples
    • Text Output Formatter
    • Sequence File Output Formatter
    • How to write custom Output Formatters and its Record Writers
    • How to write Combiners, Partitioners and use of these
    • Importance of Distributed Cache
    • Importance Counters and how to use Counters
  • Advance MapReduce Programming
  • Joins – Map Side and Reduce Side
    • Use of Secondary Sorting
    • Importance of Writable and Writable Comparable Api’s
    • How to write Map Reduce Keys and Values
    • Use of Compression techniques
    • Snappy, LZO and Zip
    • How to debug Map Reduce Jobs in Local and Pseudo Mode.
    • Introduction to Map Reduce Streaming and Pipes with examples
    • Job Submission
    • Job Initialization
    • Task Assignment
    • Task Execution
    • Progress and status bar
    • Job Completion
    • Failures
    • Task Failure
    • Tasktracker failure
    • JobTracker failure
    • Job Scheduling
    • Shuffle & Sort in depth
    • Diving into Shuffle and Sort
    • Dive into Input Splits
    • Dive into Buffer Concepts
    • Dive into Configuration Tuning
    • Dive into Task Execution
    • The Task assignment Environment
    • Speculative Execution
    • Output Committers
    • Task JVM Reuse
    • Multiple Inputs & Multiple Outputs
    • Build In Counters
    • Dive into Counters – Job Counters & User Defined Counters
    • Sql operations using Java MapReduce
    • Introduction to YARN (Next Generation Map Reduce)

Apache HIVE

  • Hive Introduction
  • Hive architecture
  • Driver
  • Compiler
  • Semantic Analyzer
  • Hive Integration with Hadoop
  • Hive Query Language(Hive QL)
  • SQL VS Hive QL
  • Hive Installation and Configuration
  • Hive, Map-Reduce and Local-Mode
  • Hive DLL and DML Operations
  • Hive Services
  • CLI
  • Schema Design
  • Views
  • Indexes
  • Hiveserver
  • Metastore
    • embedded metastore configuration
    • external metastore configuration
    • Transformations in Hive
    • UDFs in Hive
    • How to write a simple hive queries
    • Usage
    • Tuning
    • Hive with HBASE Integration
    • Need to add some more R&D done by myself

Apache PIG

  • Introduction to Apache Pig
  • Map Reduce Vs Apache Pig
  • SQL Vs Apache Pig
  • Different data types in Pig
  • Modes Of Execution in Pig
  • Local Mode
  • Map Reduce Mode
  • Execution Mechanism
  • Grunt Shell
  • Script
  • Embedded
  • Transformations in Pig
  • How to write a simple pig scrip
  • UDFs in Pig
  • Pig with HBASE Integration
  • Need to add some more R&D done by myself

Apache SQOOP

  • Introduction to Sqoop
  • MySQL client and Server Installation
  • How to connect to Relational Database using Sqoop
  • Sqoop Commands and Examples on Import and Export commands.
  • Transferring an Entire Table
  • Specifying a Target Directory
  • Importing only a Subset of data
  • Protecting your password
  • Using a file format other than CSV
  • Compressing Imported Data
  • Speeding up Transfers
  • Overriding Type Mapping
  • Controlling Parallelism
  • Encoding Null Values
  • Importing all your tables
  • Incremental Import
  • Importing only new data
  • Incrementing Importing Mutable data
  • Preserving the last imported value
  • Storing Password in the Metastore
  • Overriding arguments to a saved job
  • Sharing the MetaStore between sqoop client
  • Importing data from two tables
  • Using Custom Boundary Queries
  • Renaming Sqoop Job instances
  • Importing Queries with duplicate columns
  • Transferring data from Hadoop
  • Inserting Data in Batches
  • Exporting with All or Nothing Semantics
  • Updating an Existing Data Set
  • Updating or Inserting at the same time
  • Using Stored Procedures
  • Exporting into a subset of columns
  • Encoding the Null Value
  • Encoding the Null Value Differently
  • Exporting Corrupted Data

Apache FLUME

  • Introduction to flume
  • Flume agent usage

Apache Hbase

  • Hbase introduction
  • Hbase basics
  • Column families
  • Scans
  • Hbase installation
  • Hbase Architecture
  • Storage
  • WriteAhead Log
  • Log Structured MergeTrees
  • Mapreduce integration
  • Mapreduce over Hbase
  • Hbase Usage
  • Key design
  • Bloom Filters
  • Versioning
  • Filters
  • Hbase Clients
  • REST
  • Thrift
  • Hive
  • Web Based UI
  • Hbase Admin
  • Schema definition
  • Basic CRUD operations
  • Apache OOZIE
  • Introduction to Oozie
  • Executing workflow jobs

Hadoop Installation on Linux, All other ecosystems installations on Linux.

Cluster setup (200 Nodes cluster) knowledge sharing with setup document.

Cloudera & Hortonworks