DAT 320 - Building Scalable Machine Learning Pipelines

Delivery Options Online Synchronous

UCalgary Continuing Education

Course Description

Handle big data without consuming your resources like memory. Equip yourself with practical knowledge on how to scale machine learning solutions with Apache Spark using the Databricks platform.

Solve data-parallel problems with Spark and model parallel problems with Spark ML. Learn how to deploy Spark models for batch prediction and real-time prediction with endpoint implemented in SageMaker.

This course is delivered in collaboration with WeCloudData.

Course Details

Learning Outcomes

By the completion of this course, successful students will be able to:

Implement machine learning models using Spark Machine Learning and GraphFrame to solve large-scale ML challenges in the retail and airline industries
Create and deploy machine learning models in cloud computing using tools like Amazon SageMaker
Construct Machine Learning pipelines using orchestration tools like SageMaker Pipelines
Evaluate three popular use cases of scalable machine learning pipelines in real-time advertising, retail, and recommender systems

Topics

Mining Massive Data with Spark ML
Docker Containers 101
Building Data Pipelines with Airflow
ML Model Deployment with Amazon SageMaker
Use Case: Building Recommender Systems
Use Case: Sentiment Analysis at Scale

Who is this course for?

This course is designed for:

Data and business analytics professionals who wish to learn how to train large scale models and deploy them on the cloud
IT and Engineering Professionals looking to receive hands-on training in ML and AI
Recent graduates and academics in Computer Science

Notes

Software Requirements

This course is built around AWS solutions and services. To complete the lab activities and mini-project, you will require:

An AWS account
Access to Databricks community edition (Free)
Python ver 3.5+

Prerequisites

There are no mandatory prerequisites for this course. However, you are required to perform a self-assessment to ensure you meet the requirements to enrol.

Self-assessment for enrolment

A minimum of 1.5 years of working experience with the following skillsets: python programming, big data tools, distributed systems and MapReduce, relational database, Linux commands, cloud platform (e.g. AWS Cloud), classical machine learning algorithms, Scikit-learn library or equivalent.

Recommended Pre-requisites:

DAT 210 Cloud Computing for Data Scientists
DAT 220 Big Data for Data Scientists

Applies Towards the Following Program(s)

Deep Learning and Scalable Machine Learning : Required

Enrol Now - Select a section to enrol in

DAT 320 - 004

Nov 12, 2024

Online Synchronous

Available

Nov 12, 2024 to Dec 07, 2024

Online Synchronous

5:00PM to 8:00PM

Nov 12, 2024 to Dec 03, 2024

Online Synchronous

8:00AM to 4:00PM

Nov 16, 2024 to Dec 07, 2024

Schedule and Location

View Details

45.0

Online Synchronous

Flat Fee non-credit

$1,495.00

CE 10% Deep Learning and Scalable Bundle Discount

Required Software

Zoom web conferencing software Laptop or computer installed with Windows or Mac OS An AWS account Access to Databricks community edition (Free) Python ver 3.5+

Reading List / Textbook

No textbook required.

Section Notes

Classes are held online in real time (Mountain Time) at the specified time and dates.

This course uses:

D2L Learning Management System
Zoom web conferencing software

This course is delivered in an online blended format, meaning that some classes are taught in a live virtual session using Zoom, and some work have to be completed in a designated online e-learning platform on your own time.

For the best experience, you will require access to a computer with Internet connection, a headset with speakers and microphone, webcam, and a monitor large enough to display multiple applications (or the use of two monitors). Your computer and internet connection should meet certain requirements. See the recommended requirements.

For more information, please visit our Online Learning Resources.

Students unfamiliar with online learning are encouraged to take our free Digital Skills for Learning Online course.

Unless otherwise stated, notice of withdrawal or transfer from a course must be received at least seven calendar days prior to the start date of the course.

DAT 320 - 005

May 20, 2025

Online Synchronous

Available

May 20, 2025 to Jun 14, 2025

Online Synchronous

5:00PM to 8:00PM

May 20, 2025 to Jun 10, 2025

Online Synchronous

8:00AM to 4:00PM

May 24, 2025 to Jun 14, 2025

Schedule and Location

View Details

45.0

Online Synchronous

Flat Fee non-credit

$1,495.00

CE 10% Deep Learning and Scalable Bundle Discount

Required Software

Zoom web conferencing software Laptop or computer installed with Windows or Mac OS An AWS account Access to Databricks community edition (Free) Python ver 3.5+

Reading List / Textbook

No textbook required.

Section Notes

Classes are held online in real time (Mountain Time) at the specified time and dates.

This course uses:

D2L Learning Management System
Zoom web conferencing software

For more information, please visit our Online Learning Resources.

Students unfamiliar with online learning are encouraged to take our free Digital Skills for Learning Online course.

Unless otherwise stated, notice of withdrawal or transfer from a course must be received at least seven calendar days prior to the start date of the course.