4 to 8 weeks Practical Hands-On Big Data Hadoop Developer Certification training in Helsinki | Big Data Training | Hadoop training | Big Data analytics training | Hortonworks, Cloudera, HDFS, Map Reduce, YARN, Pig, Hive, Sqoop, Flume, Ambari training

Entirety Technology - 4 to 8 weeks Practical Hands-On Big Data Hadoop Developer Certifi...

Entirety Technology
keskiviikko 4.9.2019
Mistä 4:30
perjantai 27.9.2019
Aikeissa 6:30
Entirety Technology
Näytä kartalla
0 osallistuja
The first 16 hours of this course we will cover foundational aspects with Big Data technical essentials where you learn the foundations of hadoop, big data technology technology stack, HDFS, Hive, Pig, sqoop, how to set up Hadoop Cluster, how to store Big Data using Hadoop (HDFS), how to process/analyze the Big Data using Map-Reduce Programming or by using other Hadoop ecosystems.
The next 16 hours of the course will cover all the course topics in-depth with Hands-on lab exercises mentioned in the comprehensive course outline below.

Course Schedule for First 8 sessions (First 16 Hours)

This is a weekdays course that will be held September 3 - September 26, 2019 US Pacific Time
The class sessions will be held-Tuesday, Thursday every week
6:30-8:30 PM US Pacific time, each day.
Please check your local date and time for first session.

Course Schedule for Next 8 sessions (Next 16 Hours)

Weekdays October 1 - October 24, 2019 US Pacific Time
The class sessions will be held-Tuesday, Thursday every week

Näytä lisää
6:30-8:30 PM US Pacific time, each day.

View Detailed Weekly Training Schedule at the bottom of this event listing.

Couse Objectives

Knowledge of Hadoop components such as MapReduce, Sqoop, HBase, Hive, Pig, HDFS, Flume, ZooKeeper, Oozie, etc.
Ability to work on Hadoop related Projects as an individual contributor or as part of a team.
Setup, Install and Configure Hadoop in Different environments - Development, Support and Test environments
Hadoop architecture and various operations performed on it
Familiarity with various Hadoop Solutions.

Desired but not required - Exposure to, Working proficiency of BI, sql, scripting, how to handle and manage data and databases, using Excel, java programming language, basic UNIX commands.

Course Features

4-8 weeks, 8-16 sessions, 16-32 hours of total LIVE Instruction
Training material, instructor handouts and access to useful resources on the cloud provided
Practical Hands on Lab exercises on cloud workstations provided
Actual code and scripts provided
Real-life Scenarios

Course Outline

This is a comprehensive course outline. It is also a guideline, indicative of what topics might be covered during the class. This outline and the actual course content covered during the class by the instructors may be adjusted based on the skills, experience and background of the students when introductions are done during the beginning of the first session.
We strive to teach and cover as many topics from this course outline as possible during this training. If enough students are interested in learning additional topics in addition to the 32 hours of training delivery in even more comprehensive and in-depth manner, we can hold additional sessions for an extra charge. 1 on 1 tutorship is also available which may be slighlty expensive than a group training.

Big Data Basics

An introduction to Big Data?
Why is Big Data? Why now?
The Three Dimensions of Big Data (Three Vs)
Evolution of Big Data
Big Data versus Traditional RDBMS Databases
Big Data versus Traditional BI and Analytics
Big Data versus Traditional Storage
Key Challenges in Big Data adoption
Benefits of adoption of Big Data
Introduction to Big Data Technology Stack
Apache Hadoop Framework
Introduction to Microsoft HDInsight – Microsoft’s Big Data Service
Hands-On Lab Exercises

The Big Data Technology Stack

Basics of Hadoop Distributed File System (HDFS)
Basics of Hadoop Distributed Processing (Map Reduce Jobs)
Hands-On Lab Exercises

Deep dive into Hadoop Distributed File System (HDFS) 

Reading files with HDFS
Writing files with HDFS
Error Handling
Design and Concepts of HDFS
Blocks, Name nodes, Data nodes
HDFS High-Availability
HDFS Federation
HDFS Command-Line Interface
Basic File System Operations
Anatomy of File Read and Write
Block Placement Policy and Modes
Configuration files - Detailed explanation
FS image
Edit log
Secondary Name Node
Safe Mode
How to add New Data Node dynamically
How to decommission Data Nodes dynamically without stopping cluster
FSCK Utility
How to override default configuration at Programming level and system level
ZOOKEEPER Leader Election Algorithm
Hands-On Lab Exercises

Processing Big Data –MapReduce and YARN

How MapReduce works
Handling Common Errors
Bottlenecks with MapReduce
How YARN (MapReduceV2) works
Difference between MR1 and MR2
Error Handling
Running a simple MapReduce application (word count)
Running a custom MapReduce application (census data)
Running MapReduce via PowerShell
Running a MapReduce application using PowerShell
Monitoring application status
Hands-On Lab Exercises

Big Data Development Framework

Introduction to HIVE
Introduction to PIG
Loading the data into HIVE
Submitting Pig jobs using HDInsight
Submitting Pig jobs via PowerShell
Hands-On Lab Exercises

Big Data Integration and Management

Big Data Integration using Polybase
Big Data Management using Ambari
Fetching HDInsight data into SQL
Using Ambari for managing HDInsight cluster
Hands-On Lab Exercises

Map Reduce

Basics of Functional Programming
Map Reduce Basics
How Map Reduce Works
Anatomy of Map Reduce Job
Legacy Architecture: Job Submission, Job Initialization, Task Assignment, Task Execution, Progress
Status Updates
Job Completions and Failures
Shuffling, Sorting
Splits, Record reader, Partition, Types of partitions and Combiner
Optimization Techniques -> Speculative Execution, JVM Reuse
Schedulers, Counters
Comparisons between Old, New API at code and Architecture Level
Getting data from RDBMS into HDFS using Custom data types
Distributed Cache and Hadoop Streaming (Python, Ruby, and R)
Hands-On Lab Exercises


Sequential Files and Map Files
Enabling Compression Codec’s
Map side Join with distributed Cache
Types of I/O Formats: Multiple outputs, NLINE input format
Handling small files using Combine File Input Format
Hands-On Lab Exercises

Map Reduce and Java Programming

Hands-on “Word Count” in Map Reduce in standalone and Pseudo distribution Mode
Sorting files using Hadoop Configuration API discussion
Emulating “grep” for searching inside a file in Hadoop
DBInput Format
Job Dependency API discussion
Input Format API discussion, Split API discussion
Custom Data type creation in Hadoop
Hands-On Lab Exercises


CAP Theorem and Types of Consistency
Types of NoSQL Databases in detail
Columnar Databases in Detail (HBASE and CASSANDRA)
TTL, Bloom Filters and Compensation
Hands-On Lab Exercises


Data Model of HBase and Comparison between RDBMS and NOSQL
Master and Regional Servers
DDL and DML HBase Operations
Architecture of HBase
HBase Catalog Tables
HBase Block Cache and sharding
HBase DATA Modeling (Sequential, Salted, Promoted and Random Keys)
JAVA API’s and Rest Interface
Client-Side Buffering and Process 1 million records using Client-side Buffering
HBase Counters
Enabling Replication and HBase RAW Scans
HBase Filters
Bulk Loading and Co processors (Endpoints and Observers with programs)
Hands-On Lab Exercises


Introduction to Hive
Hive Architecture
Hive Installation
Hive Services, Shell, Server, Web Interface (HWI)
Meta store, Hive QL
Working with Tables
Primitive data types
Complex data types
Working with Partitions
User-Defined Functions
Hive Bucketed Tables and Sampling
External partitioned tables
Map the data to the partition in the table
Write the output of one query to another table, Multiple inserts
Dynamic Partition
Differences between ORDER BY, DISTRIBUTE BY and SORT BY
Bucketing and Sorted Bucketing with Dynamic partition
RC File
Compression on hive tables and Migrating Hive tables
Dynamic substation of Hive and Different ways of running Hive
How to enable Update in HIVE
Log Analysis on Hive
Access HBASE tables using Hive
Hands-on Lab Exercises


Execution Types
Grunt Shell
Pig Latin
Data Processing
Schema on read
Primitive data types and complex data types
Tuple schema, BAG Schema, and MAP Schema
Loading and Storing
Filtering, Grouping, and Joining
Debugging commands (Illustrate and Explain)
Validations, Type casting in PIG
Working with Functions
User-Defined Functions
Types of JOINS in pig and Replicated Join in detail
SPLITS and Multiquery execution
Error Handling, FLATTEN and ORDER BY
Parameter Substitution
Nested For Each
User-Defined Functions, Dynamic Invokers, and Macros
How to access HBASE using PIG, Load and Write JSON DATA using PIG
Piggy Bank
Hands-on Lab Exercises


Import Data. (Full table, Only Subset, Target Directory, protecting Password, file format other than CSV, Compressing, Control Parallelism, All tables Import)
Incremental Import (Import only New data, Last Imported data, storing Password in Metastore, Sharing Metastore between Sqoop Clients)
Free Form Query Import
Export data to RDBMS, HIVE, and HBASE
Hands-on Lab Exercises


About Hcatalog with PIG, HIVE, and MR
Hands-on Lab Exercises


Introduction and Oveview
Flume Agents: Sources, Channels, and Sinks
Log User information using Java program into HDFS using LOG4J and Avro Source, Tail Source
Log User information using Java program into HBASE using LOG4J and Avro Source, Tail Source
Flume Commands
Hands-on Lab Exercises

Different Hadoop Ecosystems



Workflow (Action, Start, Action, End, ****, Join and Fork), Schedulers, Coordinators and Bundles., to show how to schedule Sqoop Job, Hive, MR and PIG
Real-world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour
Zoo Keeper
HBASE Integration with HIVE and PIG
Proof of concept (POC)
Hands-on Lab Exercises


Spark Overview
Linking with Spark, Initializing Spark
Using the Shell
Resilient Distributed Datasets (RDDs)
Parallelized Collections
External Datasets
RDD Operations
Basics, Passing Functions to Spark
Working with Key-Value Pairs
RDD Persistence
Which Storage Level to Choose?
Removing Data
Shared Variables
Broadcast Variables
Deploying to a Cluster
Unit Testing
Migrating from pre-1.0 Versions of Spark

Detailed Weekly Schedule for First 8 sessions (1st 16 Hours)

September 3, 2019 | 6:30 PM to 8:30 PM US Pacific Time
September 5, 2019 | 6:30 PM to 8:30 PM US Pacific Time
September 10, 2019 | 6:30 PM to 8:30 PM US Pacific Time
September 12, 2019 | 6:30 PM to 8:30 PM US Pacific Time
September 17, 2019 | 6:30 PM to 8:30 PM US Pacific Time
September 19, 2019 | 6:30 PM to 8:30 PM US Pacific Time
September 24, 2019 | 6:30 PM to 8:30 PM US Pacific Time
September 26, 2019 | 6:30 PM to 8:30 PM US Pacific Time

Detailed Weekly Schedule for Next 8 sessions (Additional 16 Hours)

October 1, 2019 | 6:30 PM to 8:30 PM US Pacific Time
October 3, 2019 | 6:30 PM to 8:30 PM US Pacific Time
October 8, 2019 | 6:30 PM to 8:30 PM US Pacific Time
October 10, 2019 | 6:30 PM to 8:30 PM US Pacific Time
October 15, 2019 | 6:30 PM to 8:30 PM US Pacific Time
October 17 2019 | 6:30 PM to 8:30 PM US Pacific Time
October 22, 2019 | 6:30 PM to 8:30 PM US Pacific Time
October 24, 2019 | 6:30 PM to 8:30 PM US Pacific Time

Refund Policy

All Sales are Final. There are no Refunds.
If a student is not happy with the training experience, we strive to listen, take the feedback and implement honest and sincere measures to meet and exceed student expectations. 
If a class is rescheduled/cancelled by the organizer, registered students will be offered a credit towards any future course

Pop Title
Pop TextPop Button1Pop Button2

Näytä rakkautta.

Anna like!


Kuten jokainen haluaa help paljon.