DAT 220 - Big Data for Data Scientists

Delivery Options Online Synchronous

UCalgary Continuing Education

Course Description

Tackle Big Data head-on! Make sense of the massive data volumes that connect the world. Navigate through structured and unstructured data to find relevant information. Explore distributed systems and big data through both theory and practical skills.

Learn how to overcome the limitation of Python and single-node architecture. Apply data analysis to massive datasets utilizing the most popular Big Data tools and frameworks such as Hadoop, Hive, TrinoDB, and Apache Spark.

Gain valuable insights into how large Canadian companies in Media, Retail, and Banking apply Big Data in practice.

This course is delivered in collaboration with WeCloudData

Course Details

Learning Outcomes

By the completion of this course, successful students will be able to:

Explain the principles of big data and modern distributed computing frameworks such as distributed file system, MapReduce, and column database
Apply big data frameworks such as Apache Spark and TrinoDB to process massive datasets for data analytics and data science
Explain enterprise data lake concepts and their connection to the data warehouse
Construct big data solutions for at least one large dataset (billions of records) to discover insights with Spark on the Databricks platform

Topics

Introduction to a distributed system
Introduction to Databricks and EMR on AWS
Hadoop and MapReduce (Data Lake)
SQL for massive data mining (Hive, Presto, AWS Athena)
Big Data Processing with PySpark
Building your first big data application with Hadoop and Spark on AWS

Who is this course for?

This course is designed for:

Data and business analytics professionals who want to know how to handle big data workloads
Individuals who wish to pursue a career in Big Data
Recent graduates and academics in Computer Science

Notes

Software Requirements

This course is built around AWS solutions and services. To complete the lab activities and mini-project, you will require:

An AWS Account
Access to Databricks Community Edition (Free)
Python ver 3.5+

Prerequisites

There are no mandatory prerequisites for this course. However, you are required to perform a self-assessment to ensure you meet the requirements to enroll.

Self-assessment for enrolment:

A minimum of 1-year experience with the following skillsets: python programming, algorithm and data structures, relational database, Linux commands, and cloud platform (e.g. AWS Cloud).

Recommended Pre-requisites:

ICT 128 Relational Databases Fundamentals
DAT 210 Cloud Computing for Data Scientists

Applies Towards the Following Program(s)

Big Data in Cloud : Required

Enrol Now - Select a section to enrol in

DAT 220 - 005

Feb 25, 2025

Online Synchronous

Available

Feb 25, 2025 to Mar 29, 2025

Online Synchronous

5:00PM to 8:00PM

Feb 25, 2025 to Mar 25, 2025

Online Synchronous

8:00AM to 4:00PM

Mar 01, 2025 to Mar 29, 2025

Schedule and Location

View Details

45.0

Online Synchronous

Flat Fee non-credit

$1,495.00

Required Software

Zoom web conferencing software Laptop or computer installed with Windows or Mac OS An AWS Account Access to Databricks Community Edition (Free) Python ver 3.5+

Reading List / Textbook

No textbook required.

Section Notes

Classes are held online in real time (Mountain Time) at the specified time and dates.

This course uses:

D2L Learning Management System
Zoom web conferencing software.

This course is delivered in an online blended format, meaning that some classes are taught in a live virtual session using Zoom, and some work have to be completed in a designated online e-learning platform on your own time.

For the best experience, you will require access to a computer with Internet connection, a headset with speakers and microphone, webcam, and a monitor large enough to display multiple applications (or the use of two monitors). Your computer and internet connection should meet certain requirements. See the recommended requirements.

For more information, please visit our Online Learning Resources.

Students unfamiliar with online learning are encouraged to take our free Digital Skills for Learning Online course.

Unless otherwise stated, notice of withdrawal or transfer from a course must be received at least seven calendar days prior to the start date of the course.