Spark SQL and Spark 3 using Scala (Formerly CCA175)

A comprehensive course on Spark SQL as well as Data Frame APIs using Spark leveraging Scala as Programming Language.

Spark SQL and Spark 3 using Scala (Formerly CCA175)
Spark SQL and Spark 3 using Scala (Formerly CCA175)

Spark SQL and Spark 3 using Scala (Formerly CCA175) udemy course free download

A comprehensive course on Spark SQL as well as Data Frame APIs using Spark leveraging Scala as Programming Language.

What you'll learn:

  • Entire curriculum of CCA Spark and Hadoop Developer
  • Apache Sqoop

  • HDFS

  • Scala Fundamentals
  • Core Spark – Transformations and Actions
  • Spark SQL and Data Frames
  • Streaming analytics using Kafka, Flume and Spark Streaming

Requirements:

  • Basic programming skills
  • Cloudera Quickstart VM or valid account for IT Versity Big Data labs or any Hadoop clusters where Hadoop, Hive and Spark are well integrated.
  • Minimum memory required based on the environment you are using with 64 bit operating system

Description:

As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Scala as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation of the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Spark SQL and Spark 3 using Scala as it covers industry-relevant topics beyond the scope of certification.

About Data Engineering

Data Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.

I have prepared this course for anyone who would like to transition into a Data Engineer role using Spark (Scala). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.

Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself.

Setup of Single Node Big Data Cluster

Many of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don't worry if you do not have the cluster handy, we will guide you through with support via Udemy Q&A.

  • Setup Ubuntu based AWS Cloud9 Instance with the right configuration

  • Ensure Docker is setup

  • Setup Jupyter Lab and other key components

  • Setup and Validate Hadoop, Hive, YARN, and Spark

A quick recap of Scala

This course requires a decent knowledge of Scala. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Scala. If you are not familiar with Scala, then we suggest you go through relevant courses on Scala as Programming Language.

Data Engineering using Spark SQL

Let us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.

  • Getting Started with Spark SQL

  • Basic Transformations using Spark SQL

  • Managing Spark Metastore Tables - Basic DDL and DML

  • Managing Spark Metastore Tables Tables - DML and Partitioning

  • Overview of Spark SQL Functions

  • Windowing Functions using Spark SQL

Data Engineering using Spark Data Frame APIs

Spark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.

  • Data Processing Overview using Spark Data Frame APIs leveraging Scala as Programming Language

  • Processing Column Data using Spark Data Frame APIs leveraging Scala as Programming Language

  • Basic Transformations using Spark Data Frame APIs leveraging Scala as Programming Language - Filtering, Aggregations, and Sorting

  • Joining Data Sets using Spark Data Frame APIs leveraging Scala as Programming Language

All the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to support@itversity.com with a Udemy receipt.

Who this course is for:

Course Details:

  • 24 hours on-demand video
  • 32 articles
  • 7 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of completion

Spark SQL and Spark 3 using Scala (Formerly CCA175) udemy courses free download

A comprehensive course on Spark SQL as well as Data Frame APIs using Spark leveraging Scala as Programming Language.

Demo Link: https://www.udemy.com/course/cca-175-spark-and-hadoop-developer-certification-scala/