DAT 340 - Data Engineering on Microsoft Azure with Project
Course Description
Learn about the data engineering as it pertains to working with batch and real-time analytical solutions using Azure data platform technologies. You will begin by understanding the core compute and storage technologies that are used to build an analytical solution.
Interactively explore data stored in files in a data lake. You will learn the various ingestion techniques that can be used to load data using the Apache Spark capability found in Azure Synapse Analytics or Azure Databricks, or how to ingest using Azure Data Factory or Azure Synapse pipelines.
Students will learn the various ways to transform the data, and understand the importance of implementing security to ensure that the data is protected at rest or in transit. Create a real-time analytical system and solutions.
You will have an opportunity to complete a practical project, and apply various skills and techniques that you have learned.
This course covers the objectives for Microsoft Exam DP-203: Data Engineering on Microsoft Azure.
University of Calgary is Microsoft Education Global Training Partner
Course Details
Learning Outcomes
- Explore compute and storage options for data engineering workloads in Azure
- Run interactive queries using serverless SQL pools
- Perform data Exploration and Transformation in Azure Databricks
- Explore, transform, and load data into the Data Warehouse using Apache Spark
- Ingest and load Data into the Data Warehouse
- Transform Data with Azure Data Factory or Azure Synapse Pipelines
- Integrate Data from Notebooks with Azure Data Factory or Azure Synapse Pipelines
- Support Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link
- Perform end-to-end security with Azure Synapse Analytics
- Perform real-time Stream Processing with Stream Analytics
- Create a Stream Processing Solution with Event Hubs and Azure Databricks
- Implement an industry-relevant case study project
Topics
- Introduction to Azure Synapse Analytics, Azure Databricks, Azure Data Lake storage, and Delta Lake architecture
- Work with data streams by using AzureStream Analytics
- Data Query, metadata objects creation, data security and user management in Azure Synapse Serverless SQL Pools
- Read/write data operations and DataFrames in Azure Databricks
- Big data engineering with Apache Spark in Azure Synapse Analytics
- Integrate SQL and Apache Spark pools in Azure Synapse Analytics
- Data loading best practices in Azure Synapse Analytics
- Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipelines
- Data Integration with Azure Data Factory or Azure Synapse Pipelines
- Code-free transformation at scale with Azure Data Factory or Azure Synapse Pipelines
- Orchestrate data movement and transformation in Azure Data Factory or Azure Synapse Pipelines
- Secure a data warehouse, configure and manage secrets, implement compliance controls for sensitive data
- Design hybrid transactional and analytical processing
- Configure Azure Synapse Link with Azure CosmosDB
- Query Azure CosmosDB with Apache Spark/SQL serverless for Azure Synapse Analytics
- Enable reliable messaging for Big Data applications using Azure Event Hubs
- Ingest data streams by using Azure Stream Analytics
- Process streaming data with Azure Databricks structured streaming
- Stream data from a file and write it out to a distributed file system and connect to Event Hubs to read and write streams
- Using Sliding Windows to aggregate over chunks of data rather than all data
- Apply watermarking
Notes
This course includes hands-on activities to reinforce the concepts taught and provide a practical learning experience.
Lab access, and Azure Student Pass will be provided at no additional cost.
Prerequisites
No mandatory pre-requisite.
Self-assessment for enrolment:
A minimum of 6 months relevant working experience and knowledge in:
- Data processing languages, such as SQL, Python, or Scala
- Parallel processing and data architecture patterns
OR
Recommended prerequisites:
- ICT 905 Microsoft Azure Data Fundamentals
- ICT 128 Relational Database Fundamentals
- ICT 778 Python Foundations