Course Batch Starts, Timing, Price & Enroll

Program Duration Batch Starts Time Price # Enroll Book free demo
Weekend
60 Hrs Weekend Morning-Batch USD 600
INR 30000
Enroll Now Book free demo class
Weekend
60 Hrs Weekend Evening-Batch USD 600
INR 30000
Enroll Now Book free demo class
Weekdays
60 Hrs Weekdays Morning-Batch USD 600
INR 30000
Enroll Now Book free demo class
Weekdays
60 Hrs Weekdays Evening-Batch USD 600
INR 20000
Enroll Now Book free demo class

# Cloud lab charges will be extra. Our technical consultant will share actual lab charges with you.

About Course

The targeted audience for this course can be -

Data analysts
Business Intelligence Specialists
Developers
System Architects
Database Administrators
This course is designed to help you to understand the core concepts of Apache Hadoop, Pig, Hive and Impala that you can utilize the BigData buzzword. In this course you will get the end to end exposure on how to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.
After completing this course you will be able to -

Navigate easily in Apache Hadoop ecosystem
Understand various data analytic tools such as Pig, Hive, and Impala
Work with Apache Hadoop and data ETL (extract, transform, load)
Perform real-time and complex queries on datasets
Work with data transformation using custom scripts
Before attending this course you need to be familiar with the big data concepts and data science terminology.

CURRICULUM

Data Scientist

  • 1.1 The Motivation for Hadoop
  • 1.2 Hadoop Overview
  • 1.3 Data Storage: HDFS
  • 1.4 Distributed Data Processing: YARN, MapReduce and Spark
  • 1.5 Data Processing and Analysis: Pig, Hive and Impala
  • 1.6 Data Integration: Sqoop
  • 1.7 Other Hadoop Data Tools
  • 1.8 Exercise Scenarios Explanation
  • 2.1 What Is Pig?
  • 2.2 Pig’s Features
  • 2.3 Pig Use Cases
  • 2.4 Interacting with Pig
  • 3.1 Pig Latin Syntax
  • 3.2 Loading Data
  • 3.3 Simple Data Types
  • 3.4 Field Definitions
  • 3.5 Data Output
  • 3.6 Viewing the Schema
  • 3.7 Filtering and Sorting Data
  • 3.8 Commonly-Used Functions
  • 4.1 Storage Formats
  • 4.2 Complex/Nested Data Types
  • 4.3 Grouping
  • 4.4 Built-In Functions for Complex Data
  • 4.5 Iterating Grouped Data
  • 5.1 Techniques for Combining Data Sets
  • 5.2 Joining Data Sets in Pig
  • 5.3 Set Operations
  • 5.4 Splitting Data Sets
  • 6.1 Troubleshooting Pig
  • 6.2 Logging
  • 6.3 Using Hadoop’s Web UI
  • 6.4 Data Sampling and Debugging
  • 6.5 Performance Overview
  • 6.6 Understanding the Execution Plan
  • 6.7 Tips for Improving the Performance of Your Pig Jobs
  • 7.1 What Is Hive?
  • 7.2 What Is Impala?
  • 7.3 Schema and Data Storage
  • 7.4 Comparing Hive to Traditional Databases
  • 7.5 Hive Use Cases
  • 8.1 Databases and Tables
  • 8.2 Basic Hive and Impala Query Language Syntax
  • 8.3 Data Types
  • 8.4 Differences Between Hive and Impala Query Syntax
  • 8.5 Using Hue to Execute Queries
  • 8. 6 Using the Impala Shell
  • 9.1 Data Storage
  • 9.2 Creating Databases and Tables
  • 9.3 Loading Data
  • 9.4 Altering Databases and Tables
  • 9.5 Simplifying Queries with Views
  • 9.6 Storing Query Results
  • 10.1 Partitioning Tables
  • 10.2 Choosing a File Format
  • 10.3 Managing Metadata
  • 10.4 Controlling Access to Data
  • 11.1 Joining Datasets
  • 11.2 Common Built-In Functions
  • 11.3 Aggregation and Windowing
  • 12.1 How Impala Executes Queries
  • 12.2 Extending Impala with User-Defined Functions
  • 12.3 Improving Impala Performance
  • 13.1 Complex Values in Hive
  • 13.2 Using Regular Expressions in Hive
  • 13.3 Sentiment Analysis and N-Grams
  • 13.4 Conclusion
  • 14.1 Understanding Query Performance
  • 14.2 Controlling Job Execution Plan
  • 14.3 Bucketing
  • 14.4 Indexing Data
  • 15.1 SerDes
  • 15.2 Data Transformation with Custom Scripts
  • 15.3 User-Defined Functions
  • 15.4 Parameterized Queries

Exam & Certification

Cloudera is organizing a certification path named as CCP Data Scientists. To obtain this certificate you need to pass 3 exams –

DS700 – Descriptive and Inferential Statistics on Big Data
DS701 – Advanced Analytical Techniques on Big Data
DS702 - Machine Learning at Scale

Each exam is a single challenge scenario and you have 8 hours to complete the challenge. Each exam may be taken in any order but to avail a valid CCP Data Scientist certificate you must need to pass all three exams within 365 days of each other.

Select Trainer for Demo


Sachin Adnaik
Certification:Ph.D. Statistics
From
Professional Experience
Training Experience

Qualification
Ph.D. (Statistics)

Skills
Business Analysis, Machine Learning, Microsoft Azure, SAS, Statistics,

Profile
Have strong knowledge of R, SAS, RapidMiner, Machi ne Learning and Predictive Modeling. Conducted trainings (for Industries and for Training Institutes) on Data Science, Machine Learning, Predictive Modeling, Statistics using R and other tools. Read More...
RATING & REVIEWS
Disclaimer

** The above course information is taken from Cloudera Inc., Apache Software Foundation

* Money Back Guarantee till demo and 1st class of the course.


Copyright ©2015 Hub4Tech.com, All Rights Reserved. Hub4Tech™ is registered trademark of Hub4tech Portal Services Pvt. Ltd.
All trademarks and logos appearing on this website are the property of their respective owners.
FOLLOW US