Course Batch Starts, Timing, Price & Enroll

Program Duration Batch Starts Time Price Enroll Book free demo
32 Hrs Weekend Morning-Batch USD 400
INR 20000
Enroll Now Book free demo class
32 Hrs Weekend Evening-Batch USD 400
INR 20000
Enroll Now Book free demo class
32 Hrs Weekdays Morning-Batch USD 400
INR 20000
Enroll Now Book free demo class
32 Hrs Weekdays Evening-Batch USD 400
INR 20000
Enroll Now Book free demo class

About Course

The targeted audience for this course can be -
Software Engineers
ETL Developers
Data Scientists
Analytics Professionals
Professional looking a career in Big Data
To become an expert in Big Data Hadoop Ecosystem you are required to have in-depth understanding of Spark applications using Scala programming. This course is designed to help you in understanding the core concept of Apache Spark such as Spark Streaming, RDD, Spark SQL, DataFrames, Datasets, Spark MLlib, Spark GraphX and Spark Shell. Under this course you will learn how to customize Spark application using Scala programming.
After completing this course you will be able to –

Understand the core concept of Apache Spark
Use Scala to write programs
Work with Spark on a cluster
Understand different features of Spark like Spark Streaming, RDD, SparkSQL
Programming with Spark MLlib and Spark GraphX
As such there is no formal prerequisite to join this course but having a fundamental knowledge about any programming language, database, SQL queries and basics of Linux will help to cover this course in quick way.


Apache Spark and Scala

  • 1.1 Introduction to Scala
  • 1.2 Install and configure Scala
  • 1.3 First program using Scala
  • 1.4 Different operators in Scala
  • 1.5 Functions and Loops
  • 1.6 Array, Map, Lists, Tuples
  • 1.7 Collection
  • 1.8 OOPs concept and their use
  • 1.9 Traits as Interfaces
  • 2.1 Interactive Analysis with the Spark Shell
  • 2.2 RDD Operations
  • 2.3 Caching
  • 3.1 Linking with Spark
  • 3.2 Initializing Spark
  • 3.3 Resilient Distributed Datasets (RDDs)
  • 3.4 Parallelized Collections
  • 3.5 External Datasets
  • 3.6 RDD Operations
  • 3.7 Working with Key-Value Pairs
  • 3.8 Transformations
  • 3.9 Actions
  • 3.10 Shuffle operations
  • 3.11 RDD Persistence
  • 3.12 Shared Variables
  • 3.13 Deploying to a Cluster
  • 3.14 Unit Testing
  • 4.1 Linking
  • 4.2 Initializing StreamingContext
  • 4.3 Discretized Streams (DStreams)
  • 4.4 Input DStreams and Receivers
  • 4.5 Transformations on DStreams
  • 4.6 Output Operations on DStreams
  • 4.7 Accumulators and Broadcast Variables
  • 4.8 DataFrame and SQL Operations
  • 4.9 MLlib Operations
  • 4.10 Caching / Persistence
  • 4.11 Checkpointing
  • 4.12 Deploying Applications
  • 4.13 Monitoring Applications
  • 4.14 Reducing the Batch Processing Times
  • 4.15 Setting the Right Batch Interval
  • 4.16 Memory Tuning
  • 4.17 Fault-tolerance Semantics
  • 5.1 SQL
  • 5.2 Datasets and DataFrames
  • 5.3 Starting Point: SparkSession
  • 5.4 Creating DataFrames
  • 5.5 Running SQL Queries Programmatically
  • 5.6 Creating Datasets
  • 5.7 Data Sources
  • 5.8 Generic Load/Save Functions
  • 5.9 Parquet Files
  • 5.10 JSON Datasets
  • 5.11 Hive Tables
  • 5.12 JDBC To Other Databases
  • 5.13 Troubleshooting
  • 5.14 Performance Tuning
  • 5.15 Distributed SQL Engine
  • 6.1 Data types
  • 6.2 Basic statistics
  • 6.3 Classification and regression
  • 6.4 Collaborative filtering
  • 6.5 Clustering
  • 6.6 Dimensionality reduction
  • 6.7 Feature extraction and transformation
  • 6.8 Frequent pattern mining
  • 6.9 Evaluation metrics
  • 6.10 PMML model export
  • 7.1 The Property Graph
  • 7.2 Graph Operators
  • 7.3 Pregel API
  • 7.4 Graph Builders
  • 7.5 Vertex and Edge RDDs
  • 7.6 Optimized Representation
  • 7.7 Graph Algorithms - PageRank, Connected Components and Triangle Counting

Exam & Certification

Cloudera is offering a certification exam named as “CCA Spark and Hadoop Developer” to demonstrate the individual’s knowledge in Spark and BigData terminology.
Exam Name: CCA Spark and Hadoop Developer
Exam Code: CCA175
Number of Questions: 10–12 performance-based tasks on CDH5 cluster.
Time Limit:120 minutes
Passing Score: 70%
Language: English, Japanese


Sanjay Kumar singh
BE Computer Science
Professional Experience

Subject Expertise


Developed IVR application using 3rd party service(plivo)- Moving IVR from third party service to in-house FreeSwitch- Defined design of payment gateway services.
Pooja Agarwal
Sanjay is an extremely hard-working and dedicated person with a tremendous zeal to learn new things. He is one of the few people that i know who loves to be thrown into challenging work and comes out with flying colors always. As a person also, he has an easy-going and co-operating approach, which makes him the best team mate to have.
Alok Sharma
Sanjay has a good eye for detail. He is very enthusiastic and keen to earn new challanges. With the power of very clear basics, he is really clear in his thoughts and logics. With sound motivational and team retention skills i see him as a very good manager in the near future.
Sujit Singh
Sanjay worked for me as an IT resource. He was very conscientious, responsible, and delivered on time.

** The above course information is taken from The Apache Software Foundation

* Money Back Guarantee till demo and 1st class of the course.

Copyright ©2015, All Rights Reserved. Hub4Tech™ is registered trademark of Hub4tech Portal Services Pvt. Ltd.
All trademarks and logos appearing on this website are the property of their respective owners.