Course Description

Handle big data without consuming your resources like memory. Equip yourself with practical knowledge on how to scale machine learning solutions with Apache Spark using the Databricks platform.

Solve data-parallel problems with Spark and model parallel problems with Spark ML. Learn how to deploy Spark models for batch prediction and real-time prediction with endpoint implemented in SageMaker.

This course is delivered in collaboration with WeCloudData.

Course Details

Learning Outcomes

By the completion of this course, successful students will be able to:

  • Implement machine learning models using Spark Machine Learning and GraphFrame to solve large-scale ML challenges in the retail and airline industries
  • Create and deploy machine learning models in cloud computing using tools like Amazon SageMaker
  • Construct Machine Learning pipelines using orchestration tools like SageMaker Pipelines
  • Evaluate three popular use cases of scalable machine learning pipelines in real-time advertising, retail, and recommender systems


  • Mining Massive Data with Spark ML
  • Docker Containers 101
  • Building Data Pipelines with Airflow
  • ML Model Deployment with Amazon SageMaker
  • Use Case: Building Recommender Systems
  • Use Case: Sentiment Analysis at Scale

Who is this course for?

This course is designed for:

  • Data and business analytics professionals who wish to learn how to train large scale models and deploy them on the cloud
  • IT and Engineering Professionals looking to receive hands-on training in ML and AI
  • Recent graduates and academics in Computer Science


Software Requirements

This course is built around AWS solutions and services. To complete the lab activities and mini-project, you will require:

  • An AWS account
  • Access to Databricks community edition (Free)
  • Python ver 3.5+


There are no mandatory prerequisites for this course. However, you are required to perform a self-assessment to ensure you meet the requirements to enrol.

Self-assessment for enrolment

  • A minimum of 1.5 years of working experience with the following skillsets: python programming, big data tools, distributed systems and MapReduce, relational database, Linux commands, cloud platform (e.g. AWS Cloud), classical machine learning algorithms, Scikit-learn library or equivalent.

Recommended Pre-requisites:

  • DAT 210 Cloud Computing for Data Scientists
  • DAT 220 Big Data for Data Scientists

Applies Towards the Following Program(s)

Enrol Now - Select a section to enrol in
Online Synchronous
5:00PM to 8:00PM
Nov 12, 2024 to Dec 03, 2024
Online Synchronous
8:00AM to 4:00PM
Nov 16, 2024 to Dec 07, 2024
Schedule and Location
Delivery Options
Course Fees
Flat Fee non-credit $1,495.00
Required Software
Zoom web conferencing software Laptop or computer installed with Windows or Mac OS An AWS account Access to Databricks community edition (Free) Python ver 3.5+
Reading List / Textbook

No textbook required.

Section Notes

Classes are held online in real time (Mountain Time) at the specified time and dates.

This course uses:

  • D2L Learning Management System
  • Zoom web conferencing software

This course is delivered in an online blended format, meaning that some classes are taught in a live virtual session using Zoom, and some work have to be completed in a designated online e-learning platform on your own time.

For the best experience, you will require access to a computer with Internet connection, a headset with speakers and microphone, webcam, and a monitor large enough to display multiple applications (or the use of two monitors). Your computer and internet connection should meet certain requirements. See the recommended requirements.

For more information, please visit our Online Learning Resources.

Students unfamiliar with online learning are encouraged to take our free Digital Skills for Learning Online course.

Unless otherwise stated, notice of withdrawal or transfer from a course must be received at least seven calendar days prior to the start date of the course.

Required fields are indicated by .