Course Description

Tackle Big Data head-on! Make sense of the massive data volumes that connect the world. Navigate through structured and unstructured data to find relevant information. Explore distributed systems and big data through both theory and practical skills.

Learn how to overcome the limitation of Python and single-node architecture. Apply data analysis to massive datasets utilizing the most popular Big Data tools and frameworks such as Hadoop, Hive, TrinoDB, and Apache Spark.

Gain valuable insights into how large Canadian companies in Media, Retail, and Banking apply Big Data in practice.

This course is delivered in collaboration with WeCloudData

Course Details

Learning Outcomes

By the completion of this course, successful students will be able to:

  • Explain the principles of big data and modern distributed computing frameworks such as distributed file system, MapReduce, and column database
  • Apply big data frameworks such as Apache Spark and TrinoDB to process massive datasets for data analytics and data science
  • Explain enterprise data lake concepts and their connection to the data warehouse
  • Construct big data solutions for at least one large dataset (billions of records) to discover insights with Spark on the Databricks platform


  • Introduction to a distributed system
  • Introduction to Databricks and EMR on AWS
  • Hadoop and MapReduce (Data Lake)
  • SQL for massive data mining (Hive, Presto, AWS Athena)
  • Big Data Processing with PySpark
  • Building your first big data application with Hadoop and Spark on AWS

Who is this course for?

This course is designed for:

  • Data and business analytics professionals who want to know how to handle big data workloads
  • Individuals who wish to pursue a career in Big Data
  • Recent graduates and academics in Computer Science


Software Requirements

This course is built around AWS solutions and services. To complete the lab activities and mini-project, you will require:

  • An AWS Account
  • Access to Databricks Community Edition (Free)
  • Python ver 3.5+


There are no mandatory prerequisites for this course. However, you are required to perform a self-assessment to ensure you meet the requirements to enroll.

Self-assessment for enrolment:

A minimum of 1-year experience with the following skillsets: python programming, algorithm and data structures, relational database, Linux commands, and cloud platform (e.g. AWS Cloud).

Recommended Pre-requisites:

Applies Towards the Following Program(s)

Enrol Now - Select a section to enrol in
Online Synchronous
5:00PM to 8:00PM
Feb 25, 2025 to Mar 25, 2025
Online Synchronous
8:00AM to 4:00PM
Mar 01, 2025 to Mar 29, 2025
Schedule and Location
Delivery Options
Course Fees
Flat Fee non-credit $1,495.00
Required Software
Zoom web conferencing software Laptop or computer installed with Windows or Mac OS An AWS Account Access to Databricks Community Edition (Free) Python ver 3.5+
Reading List / Textbook

No textbook required.

Section Notes

Classes are held online in real time (Mountain Time) at the specified time and dates.

This course uses:

  • D2L Learning Management System
  • Zoom web conferencing software.

This course is delivered in an online blended format, meaning that some classes are taught in a live virtual session using Zoom, and some work have to be completed in a designated online e-learning platform on your own time.

For the best experience, you will require access to a computer with Internet connection, a headset with speakers and microphone, webcam, and a monitor large enough to display multiple applications (or the use of two monitors). Your computer and internet connection should meet certain requirements. See the recommended requirements.

For more information, please visit our Online Learning Resources.

Students unfamiliar with online learning are encouraged to take our free Digital Skills for Learning Online course.

Unless otherwise stated, notice of withdrawal or transfer from a course must be received at least seven calendar days prior to the start date of the course.

Required fields are indicated by .