Loading...

Course Description

Tackle Big Data head-on! Make sense of the massive data volumes that connect the world. Navigate through structured and unstructured data to find relevant information. Explore distributed systems and big data through both theory and practical skills.

Learn how to overcome the limitation of Python and single-node architecture. Apply data analysis to massive datasets utilizing the most popular Big Data tools and frameworks such as Hadoop, Hive, TrinoDB, and Apache Spark.

Gain valuable insights into how large Canadian companies in Media, Retail, and Banking apply Big Data in practice.

This course is delivered in collaboration with WeCloudData

Course Details

Learning Outcomes

By the completion of this course, successful students will be able to:

  • Explain the principles of big data and modern distributed computing frameworks such as distributed file system, MapReduce, and column database
  • Apply big data frameworks such as Apache Spark and TrinoDB to process massive datasets for data analytics and data science
  • Explain enterprise data lake concepts and their connection to the data warehouse
  • Construct big data solutions for at least one large dataset (billions of records) to discover insights with Spark on the Databricks platform

Topics

  • Introduction to a distributed system
  • Introduction to Databricks and EMR on AWS
  • Hadoop and MapReduce (Data Lake)
  • SQL for massive data mining (Hive, Presto, AWS Athena)
  • Big Data Processing with PySpark
  • Building your first big data application with Hadoop and Spark on AWS

Who is this course for?

This course is designed for:

  • Data and business analytics professionals who want to know how to handle big data workloads
  • Individuals who wish to pursue a career in Big Data
  • Recent graduates and academics in Computer Science

Notes

Software Requirements

This course is built around AWS solutions and services. To complete the lab activities and mini-project, you will require:

  • An AWS Account
  • Access to Databricks Community Edition (Free)
  • Python ver 3.5+

Prerequisites

There are no mandatory prerequisites for this course. However, you are required to perform a self-assessment to ensure you meet the requirements to enroll.

Self-assessment for enrolment:

A minimum of 1-year experience with the following skillsets: python programming, algorithm and data structures, relational database, Linux commands, and cloud platform (e.g. AWS Cloud).

Recommended Pre-requisites:

Applies Towards the Following Program(s)

Loading...

Thank you for your interest...

Unfortunately, this course is not currently open for enrolment.

If you have a Professional and Continuing Education account, take note of the course number and submit a course inquiry to be notified if new sections become available.

Questions?

Required fields are indicated by .