Handle big data without consuming your resources like memory. Equip yourself with practical knowledge on how to scale machine learning solutions with Apache Spark using the Databricks platform.
Solve data-parallel problems with Spark and model parallel problems with Spark ML. Learn how to deploy Spark models for batch prediction and real-time prediction with endpoint implemented in SageMaker.
This course is delivered in collaboration with WeCloudData.
By the completion of this course, successful students will be able to:
- Implement machine learning models using Spark Machine Learning and GraphFrame to solve large-scale ML challenges in the retail and airline industries
- Create and deploy machine learning models in cloud computing using tools like Amazon SageMaker
- Construct Machine Learning pipelines using orchestration tools like SageMaker Pipelines
- Evaluate three popular use cases of scalable machine learning pipelines in real-time advertising, retail, and recommender systems
- Mining Massive Data with Spark ML
- Docker Containers 101
- Building Data Pipelines with Airflow
- ML Model Deployment with Amazon SageMaker
- Use Case: Building Recommender Systems
- Use Case: Sentiment Analysis at Scale
Who is this course for?
This course is designed for:
- Data and business analytics professionals who wish to learn how to train large scale models and deploy them on the cloud
- IT and Engineering Professionals looking to receive hands-on training in ML and AI
- Recent graduates and academics in Computer Science
This course is built around AWS solutions and services. To complete the lab activities and mini-project, you will require:
- An AWS account
- Access to Databricks community edition (Free)
- Python ver 3.5+
There are no mandatory prerequisites for this course. However, you are required to perform a self-assessment to ensure you meet the requirements to enrol.
Self-assessment for enrolment
- A minimum of 1.5 years of working experience with the following skillsets: python programming, big data tools, distributed systems and MapReduce, relational database, Linux commands, cloud platform (e.g. AWS Cloud), classical machine learning algorithms, Scikit-learn library or equivalent.
- DAT 210 Cloud Computing for Data Scientists
- DAT 220 Big Data for Data Scientists
Applies Towards the Following Program(s)
- Deep Learning and Scalable Machine Learning : Required