Course Batch Starts, Timing, Price & Enroll

Program Duration Batch Starts Time Price Enroll
Weekend
32 Hrs Weekend Morning-Batch USD 400 300

Enroll Now
Weekend
32 Hrs Weekend Evening-Batch USD 400 300

Enroll Now
Weekdays
32 Hrs Weekdays Morning-Batch USD 400 300

Enroll Now
Weekdays
32 Hrs Weekdays Evening-Batch USD 400 300

Enroll Now

About Course

The targeted audience for this course can be -
Software Engineers
ETL Developers
Data Scientists
Analytics Professionals
Professional looking a career in Big Data
To become an expert in Big Data Hadoop Ecosystem you are required to have in-depth understanding of Spark applications using Scala programming. This course is designed to help you in understanding the core concept of Apache Spark such as Spark Streaming, RDD, Spark SQL, DataFrames, Datasets, Spark MLlib, Spark GraphX and Spark Shell. Under this course you will learn how to customize Spark application using Scala programming.
After completing this course you will be able to –

Understand the core concept of Apache Spark
Use Scala to write programs
Work with Spark on a cluster
Understand different features of Spark like Spark Streaming, RDD, SparkSQL
Programming with Spark MLlib and Spark GraphX
As such there is no formal prerequisite to join this course but having a fundamental knowledge about any programming language, database, SQL queries and basics of Linux will help to cover this course in quick way.

CURRICULUM

Apache Spark and Scala

  • 1.1 Introduction to Scala
  • 1.2 Install and configure Scala
  • 1.3 First program using Scala
  • 1.4 Different operators in Scala
  • 1.5 Functions and Loops
  • 1.6 Array, Map, Lists, Tuples
  • 1.7 Collection
  • 1.8 OOPs concept and their use
  • 1.9 Traits as Interfaces
  • 2.1 Interactive Analysis with the Spark Shell
  • 2.2 RDD Operations
  • 2.3 Caching
  • 3.1 Linking with Spark
  • 3.2 Initializing Spark
  • 3.3 Resilient Distributed Datasets (RDDs)
  • 3.4 Parallelized Collections
  • 3.5 External Datasets
  • 3.6 RDD Operations
  • 3.7 Working with Key-Value Pairs
  • 3.8 Transformations
  • 3.9 Actions
  • 3.10 Shuffle operations
  • 3.11 RDD Persistence
  • 3.12 Shared Variables
  • 3.13 Deploying to a Cluster
  • 3.14 Unit Testing
  • 4.1 Linking
  • 4.2 Initializing StreamingContext
  • 4.3 Discretized Streams (DStreams)
  • 4.4 Input DStreams and Receivers
  • 4.5 Transformations on DStreams
  • 4.6 Output Operations on DStreams
  • 4.7 Accumulators and Broadcast Variables
  • 4.8 DataFrame and SQL Operations
  • 4.9 MLlib Operations
  • 4.10 Caching / Persistence
  • 4.11 Checkpointing
  • 4.12 Deploying Applications
  • 4.13 Monitoring Applications
  • 4.14 Reducing the Batch Processing Times
  • 4.15 Setting the Right Batch Interval
  • 4.16 Memory Tuning
  • 4.17 Fault-tolerance Semantics
  • 5.1 SQL
  • 5.2 Datasets and DataFrames
  • 5.3 Starting Point: SparkSession
  • 5.4 Creating DataFrames
  • 5.5 Running SQL Queries Programmatically
  • 5.6 Creating Datasets
  • 5.7 Data Sources
  • 5.8 Generic Load/Save Functions
  • 5.9 Parquet Files
  • 5.10 JSON Datasets
  • 5.11 Hive Tables
  • 5.12 JDBC To Other Databases
  • 5.13 Troubleshooting
  • 5.14 Performance Tuning
  • 5.15 Distributed SQL Engine
  • 6.1 Data types
  • 6.2 Basic statistics
  • 6.3 Classification and regression
  • 6.4 Collaborative filtering
  • 6.5 Clustering
  • 6.6 Dimensionality reduction
  • 6.7 Feature extraction and transformation
  • 6.8 Frequent pattern mining
  • 6.9 Evaluation metrics
  • 6.10 PMML model export
  • 7.1 The Property Graph
  • 7.2 Graph Operators
  • 7.3 Pregel API
  • 7.4 Graph Builders
  • 7.5 Vertex and Edge RDDs
  • 7.6 Optimized Representation
  • 7.7 Graph Algorithms - PageRank, Connected Components and Triangle Counting

Exam & Certification

Cloudera is offering a certification exam named as “CCA Spark and Hadoop Developer” to demonstrate the individual’s knowledge in Spark and BigData terminology.
Exam Name: CCA Spark and Hadoop Developer
Exam Code: CCA175
Number of Questions: 10–12 performance-based tasks on CDH5 cluster.
Time Limit:120 minutes
Passing Score: 70%
Language: English, Japanese
Disclaimer

** The above course information is taken from The Apache Software Foundation

* Money Back Guarantee till demo and 1st class of the course.

Enhance your Skill

Drop us a Query


Copyright © 2015 Hub4Tech.com, All Rights Reserved. Hub4Tech™ is registered trademark of Hub4tech Portal Services Pvt. Ltd.
All trademarks and logos appearing on this website are the property of their respective owners.