Hadoop Developer/Admin Training – Course Content
Overview:
Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands on labs.
Training Objectives of Hadoop Developer/Admin:
Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.
Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux.
Hadoop Architecture
- Introduction to Hadoop
- Parallel Computer vs. Distributed Computing
- How to install Hadoop on your system
- How to install Hadoop cluster on multiple machines
- Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker
- Exploring HDFS (Hadoop Distributed File System) Exploring the HDFS Apache Web UI
- NameNode architecture (EditLog, FsImage, location of replicas) Secondary NameNode architecture
- DataNode architecture
MapReduce Architecture
- Exploring JobTracker/TaskTracker
- How a client submits a Map-Reduce job
- Exploring Mapper/Reducer/Combiner
- Shuffle: Sort & Partition
- Input/output formats
- Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler) Exploring the Apache MapReduce Web UI
Hadoop Developer Tasks
- Writting a map-reduce programme
- Reading and writing data using
- Java Hadoop Eclipse integration
- Mapper in details
- Reducer in details
- Using Combiners
- Reducing Intermediate Data with Combiners
- Writing Partitioners for Better Load
- Balancing Sorting in HDFS
- Searching in HDFS
- Indexing in HDFS
- Hands-On Exercise
Hadoop Administrative Tasks
- Routine Administrative Procedures
- Understanding dfsadmin and mradmin Block Scanner, Balancer
- Health Check & Safe mode
- DataNode commissioning/decommissioning
- Monitoring and Debugging on a production
- cluster NameNode Back up and Recovery
- ACL (Access control list) Upgrading Hadoop
HBase Architecture
- Introduction to Hbase
- HBase vs. RDBMS
- Exploring HBase Master & region server
- Column Families and Regions
- Basic Hbase shell commands.
Hive Architecture
- Introduction to Hive
- HBase vs Hive
- Installation of Hive
- HQL (Hive query language)
- Basic Hive commands
Pig Architecture
- Introduction to Pig
- Installation of Pig on your system
- Basic Pig commands
- Hands-On Exercise
Sqoop Architecture
- Introduction to Sqoop
- Installation of Sqoop on your system
- Import/Export data from RDBMS to HDFS
- Import/Export data from RDBMS to HBase
- Import/Export data from RDBMS to Hive
- Hands-On Exercise
Mini Project / POC ( Proof of Concept )
- Facebook-Hive POC
- Usages of Hadoop/Hive @ Facebook
- Static & dynamic partitioning
- UDF ( User defined functions )