Join the data engineering revolution

Learn from leading industry experts working at the world's most innovative companies

16 weeks part-time classes that adapt to your busy schedule

Learn from anywhere in our virtual classroom with live lectures and hands-on labs

Next cohort in

Days
Hours
Minutes
Seconds
now

20 Seats Remaining

Curriculum

Master the skills to become an effective data engineer with the modern data stack in 16 weeks. 

Master the core concepts and primitives used in data engineering around ETL. Abstractions and tools such as Airflow and Airbyte are built on top of the core concepts and primitives taught in this topic. 

  • Python virtual environments 
  • ETL with interactive Jupyter Notebooks
  • Data extraction patterns from APIs (full vs incremental) 
  • Data transformation using dataframes (Pandas) 
  • Data loading patterns to files (CSV, Parquet) and databases (SQLAlchemy, PostgreSQL) 
  • Functional Programming with Python 
  • Modular programming using Python modules
  • Object Oriented Programming with Python
  • Logging with Python (PostgreSQL) 
  • Unit and integration testing with Python 
  • Code linting with Python
  • Metadata config pipeline (YAML) 
  • Metadata logging to database 
  • Cron scheduling 

Continue down the path of mastering core concepts and primitives used in data engineering. In this topic, we learn the ELT pattern, a fairly recent addition to data engineering that was born from the recent explosion of cloud adoption. 

  • Data extraction patterns from databases (full vs incremental vs CDC) 
  • Data loading patterns to databases (overwrite vs insert vs upsert vs merge) 
  • SQL  Transformations in databases (PostgreSQL)
  • Jinja for SQL templating (Jinja, Python, SQLAlchemy)  
  • Directed acyclic graphs (DAGs) with Python 
  • SQL Common Table Expressions (CTEs) 
  • SQL Window Functions 
  • Modularising ELT pipelines 
  • Unit testing ELT pipelines 
  • Logging for ELT pipelines (PostgreSQL) 
  • Metadata config for ELT pipelines (YAML) 

Master the concepts to containerize, build, and deploy ETL pipelines into a production environment hosted on the cloud. Enable code versioning and team collaboration best practices through Git. 

  • Git:
    • Git version control system 
    • Git workflow (add, commit, push, merge, pull) 
    • Git branching 
    • Github pull requests 
  • Docker:
    • Computer vs Virtual Machine vs Docker 
    • Docker image and container 
    • Docker commands 
    • Dockerfile 
    • Docker volumes 
    • Docker compose
    • Docker repository 
    • Containerize an ETL pipeline 
  • Cloud (AWS): 
    • Identity and Access Management (IAM) – Users, Policies, Groups, and Roles 
    • Relational Database Service (RDS)
    • Simple Storage Service (S3) 
    • AWS CLI and Boto3 
    • Elastic Container Registry (ECR) 
    • Elastic Container Service (ECS) 
    • Deploy and schedule an ETL pipeline on ECS

Create an end-to-end ETL pipeline using extract, load, and transform patterns covered in earlier topics. 

  • Extract data from a data source of your choosing, load data into a data store of your choosing, and transform data to support a use-case of your choosing. 
  • Apply metadata logging, metadata configuration, unit and integration testing, and cron scheduling for your pipeline. 
  • Use git and github to apply code versioning, git branching, and pull requests. 
  • Containerize your ETL pipeline using docker, and deploy the pipeline on the cloud.

Most businesses generate and store data in multiple systems such as Customer Relationship Management (CRM) systems, Order Management Systems (OMS), Accounting systems, Marketing platforms, and many more. Handcrafting the Extract and Load logic for each system is a tedious process that can be automated using data integration tools such as Airbyte. 

  • Airbyte sources, destinations, and connections 
  • Airbyte extract patterns (full, incremental, CDC) 
  • Airbyte load patterns (overwrite, insert, upsert, merge) 
  • Octavia CLI 
  • Airbyte API 
  • Airbyte Custom Connectors 
  • Deploying airbyte on AWS EC2
  • End-to-end ELT pipeline with Airbyte on AWS

As businesses scale, their data volumes increase until it is no longer viable to process data on a single compute instance. To solve scale issues, we look at technologies like Snowflake which are designed to perform analytical processing on large volumes of data. 

We streamline our Transform pipeline development process using dbt, which popularized a subfield of data engineering known as Analytics Engineering. 

  • OLAP vs OLTP 
  • Snowflake architecture 
  • Snowflake RBAC 
  • Loading data into Snowflake 
  • Parsing JSON with Snowflake 
  • Snowflake micro-partitions and clustering
  • dbt project 
  • dbt commands (run, test, build, list)
  • Writing and running a dbt model
  • dbt seeds, tests, and macros
  • dbt docs
  • dbt in production (profiles, targets, and deploy on AWS)

Data engineering does not exist in a vacuum. As data engineers, we transform and model data for our end user consumption to power use-cases such as machine learning, business intelligence, and analytics. To model the data with software engineering principles of modularity and reusability, we apply data modelling techniques such as dimensional modelling. To enable end users to slice and dice models that we produce, we provide a layer on top of the data warehouse known as the semantic layer. 

  • Normalization vs Denormalization
  • Data modelling concepts: 
    • Dimensional modelling by Ralph Kimball (also known as Star Schema) 
    • Data warehouse modelling by Bill Inmon 
    • Data vault modelling by Dan Linstedt 
    • One Big Table (OBT) 
  • Applied dimensional modelling using dbt: 
    • Fact and dimension tables
    • dbt snapshots and Slowly Changing Dimensions (SCD) 
    • Transactional fact table 
    • Snapshot fact table 
    • Accumulating snapshot fact table 
    • Factless fact table 
    • Incremental fact table load 
  • Semantic modelling: 
    • Semantic modelling concepts and tools 
    • Semantic modelling and metrics using Preset 
    • Preset chart
    • Preset dashboard

Spark is a distributed data processing system capable of processing large volumes of data. Databricks provides an ecosystem of tooling to enable Spark to run for multiple use-cases such as data engineering, stream processing, and machine learning. 

We discover how Spark uses the separation of storage and compute to enable scale, learn about the delta file format, use Spark for data engineering, and apply data quality tests using Great Expectations. 

  • Big data processing architectures
  • Spark internals and core concepts 
  • Spark reading and writing 
  • Spark SQL 
  • Spark DataFrame 
  • Spark joins, group by, and aggregation 
  • Spark UDF 
  • Spark query plan and optimization 
  • Spark partition keys  
  • ACID file formats (delta file format) 
  • Data orchestration using Databricks Workflows
  • Manage the Databricks workspace using API and CLI 
  • Data quality testing with Great Expectations

Create an end-to-end ETL pipeline capable of processing large volumes of data. 

  • Extract data from a data source of your choosing, load data into a data store of your choosing, and transform data to support a use-case of your choosing. 
  • Use Airbyte to perform data integration, an abstraction layer over low-level Extract and Load patterns. 
  • Transform data using compute engines from the Data Warehousing paradigm (Snowflake), or the Data Lakehouse paradigm (Databricks). 
  • Apply transformation logic using traditional ETL patterns, or Analytics Engineering patterns using dbt. 
  • Apply metadata logging, metadata configuration, unit and integration testing, and cron scheduling for your pipeline. 
  • Use git and github to apply code versioning, git branching, and pull requests. 
  • Deploy your end-to-end ETL pipeline on the cloud.

 

Data orchestration enables data engineers to stitch together different parts of ETL into a single cohesive pipeline. Data orchestration makes it easy to trigger, schedule, monitor, and configure alerts for the pipelines. Data orchestrators like Airflow come with plugins or providers to connect existing tools in your data stack like Airbyte, dbt, Snowflake and Databricks, so that you can easily orchestrate steps between them. 

  • Data orchestration architecture and patterns 
  • Airflow DAGs, Operators and Tasks
  • Airflow schedule 
  • Airflow patterns for catchup, idempotence, backfill, and branching
  • Airflow variables, connections, hooks, and providers 
  • Airflow cross communication (XComs) 
  • Airflow sensors
  • Airflow Dynamic Task Mapping, and taskflow
  • Trigger Rules
  • Watcher Pattern
  • Deploy airflow locally, and on AWS using EC2 or MWAA 
  • Extending the Airflow Docker image
  • Airflow providers for Airbyte, dbt, Databricks, Snowflake, and Slack alerts

Enable real-time insights from fast moving data. Learn the core concepts and primitives of stream processing using Kafka, and deploy kafka topics on Confluent Cloud. Integrate real-time events into Clickhouse, a real-time database, and perform data transformation in Clickhouse. Define and test Clickhouse materialized views using dbt. 

  • Streaming concepts 
  • Kafka key concepts
  • Kafka broker and topics  
  • Kafka CLI
  • Creating a Python Kafka producer
  • Creating a Python Kafka consumer
  • Stream analytics using ksqlDB
  • Deploy kafka on Confluent Cloud 
  • Real-time databases with Clickhouse 
  • Clickhouse architecture and internals 
  • Use Kafka Connect to integrate data into Clickhouse
  • Tables, views, and materialized views on Clickhouse
  • dbt to test and deploy Clickhouse objects 
  • Real-time dashboard with Preset, Clickhouse, and Kafka

As the data engineering team grows, so does the code complexity. To provide assurance that data engineers are doing the right things, automated code integration pipelines can be used to test and verify a data engineer’s code changes in a separate branch-based environment. After code has been validated, code can be automatically built and released into various deployment environments such as staging and production. 

  • Principles of DataOps 
  • Continuous integration pipelines: 
    • Unit testing 
    • Code linting tests 
    • Data quality testing 
    • Branch-based testing environments 
    • CI pipelines for dbt 
    • CI pipelines for Python ETL 
  • Continuous deployment pipelines: 
    • Containerize and build 
    • Deployment environments 
    • Deploy using Infrastructure as Code (IaC)

Showcase all the skills and technologies you have learnt throughout the bootcamp to future employers. Implement either a lambda or kappa architecture with ETL pipelines capable of processing large volumes of data.

  • Extract data from a data source of your choosing, load data into a data store of your choosing, and transform data to support a use-case of your choosing. 
  • Apply unit testing, data quality testing, and monitoring over your ETL pipelines 
  • Use git and github to apply code versioning, git branching, and pull requests 
  • Implement CI pipelines to automatically test your code in a branch-based environment 
  • Implement CD pipelines to automatically deploy your end-to-end ETL pipeline on the cloud 
  • Present your capstone projects at the Demo Day to data engineering experts working in startups or large companies

Untitled design (11)

Real-world projects

Graduate with a portfolio of professional projects that you can showcase to the world. Take a look at some projects below from our most recent cohort. 

Demo day

Kickstart your career in data engineering by presenting your capstone project to Data Engineers and representatives working at startups and large companies. 

Capstone project: Musically inclined

A batch and streaming data pipeline that analyzes music streaming service event data. Built using Kafka, Clickhouse, Snowflake, dbt, Preset, AWS, Azure, Docker, and GitHub Actions.

Paul Hallaste, Analytics Lead at Fidelity International

Alexander Potts, Data Engineer at Endeavour Group

Capstone project: Twitter keywords

sentiment-analysis-project

A keyword frequency dashboard to support Search Engine Optimization (SEO) analysis. Built using Airbyte, Python, AWS (S3, EC2), Snowflake, dbt, Airflow, Lightdash, and GitHub Actions. 

Rashid Mohammed, Data Analyst at Heritage Bank

Join our next bootcamp

December 2023 Cohort

  16 weeks, 4 December – 8 April 2024
(2 weeks non-teaching break between 25 December 2023 to 4 January 2024)

  Monday, Tuesday, and Thursday
10:00am – 1:00pm (UTC)

  Office hours, Saturday 12:00am – 2:00am (UTC)

  SOLD OUT

April 2024 Cohort

  16 weeks, 29 April 2024 – 19 August 2024

  Monday, Tuesday, and Thursday
10:00am – 1:00pm (UTC)

  Office hours, Saturday 12:00am – 2:00am (UTC)

  20 seats available

Our industry leading instructors

Jonathan Neo

Jonathan is a Data Engineer at Canva where he is building data platforms to empower teams to unlock insights to their products. He has previously worked at EY, Telstra Purple, and Mantel Group, where he has led data engineering teams, built nearly a dozen data platforms, and developed new products and offerings. Jonathan has taught over a hundred data professionals who are now working at leading technology companies around the world.

Hengji Liu​

Hengji is a Data Engineer at Canva where he manages data architecture and pipelines for his group. He previously worked at Servian, a boutique consultancy focused on data solutions, before joining the Macquarie Group as an internal facing data consultant. Hengji’s extensive consulting experience has made him an excellent listener and explainer.

Pavan Raju

Pavan is a Data Engineer at Deloitte and brings software engineering rigour to data problems for his clients. He has previously worked at Commonwealth Bank as a Software Engineer, before joining Tyro payments where he implemented streaming pipelines using Kafka. Pavan is passionate about helping students achieve their “aha” moment by making complex subjects simple. 

Prerequisites

Python – You are comfortable with variables, lists, dictionaries, functions, loops, conditionals, and using Python libraries. 

				
					if understand: 
    print("You understand the basics")
else:
    print("Take some time to learn the basics")
				
			

SQL – You are comfortable with Data Manipulation Language (DML) such as select, group by, where, having, insert, delete, update. You are comfortable with Data Definition Language (DDL) such as  create table, alter table

				
					CASE 
    WHEN understand=TRUE THEN 'You understand the basics' 
    ELSE 'Take some time to learn the basics'
END; 

				
			

Support

Career services

  1:1 coaching with a career coach

Receive guidance on your career trajectory, resume review, and preparation for interviews.  

  1:1 expert advice from practitioners 

Receive expert advice from data engineering practitioners about industry trends, technology stack tradeoffs, and professional development. 

Learning assistance

  Ask questions in the live-classes and office hours and your instructor will provide answers 

  Ask questions in the Slack channel #help and your peers or instructors will provide answers 

  Work on projects in a group and hold each other accountable

Alumni community

  Alumni slack channel 

Join our alumni community slack channel and stay in touch with your peers. 

  Alumni events 

Attend alumni-only events and network with other data engineers in the industry. 

Testimonials

I would recommend the bootcamp to two types of people. First, people who are interested in a career as a data engineer. The course teaches you the fundamentals for building a data stack using modern data tooling in hands on, steady paced, rigorous manner. Second, people who work with data engineers and are interested in understanding the modern data stack and how different parts of the data engineering lifecycle work together. Within weeks, I was able to set up a Kafka pipeline, use dbt to transform data, and set up a CI/CD pipeline via GitHub Actions.

Paul Hallaste, Analytics Lead at Fidelity International

If you're looking to break into the industry, I think this bootcamp is probably your best choice. It will help you to have a good understanding of how to combine all the tools together into the single pipeline. Using high and low level programming, build custom drivers and utilize the pre built ones, build the portfolio to showcase to the potential employers. It will also provide interview tips, in addition to the opportunity to introduce your project to multiple companies at the end of the bootcamp. If you're looking for a career change, welcome to the opportunity.

Mantas Liutkus, Data and Automation Engineer at M Solutions Corp

I would 100% recommend Data Engineer Camp to both newbies to the discipline looking to change their careers, and seasoned pros looking to round out their knowledge. I learned something brand new, or went deeper on topics I already knew almost every single week. The value of that really can't be understated. The best thing about the bootcamp is definitely the level of detail. The bootcamp really went deep into spark internals, which I was really impressed by, and that level of detail was maintained across every single topic.

Alexander Potts, Data Engineer at Endeavour Group

The best thing about bootcamp are the people. But to elaborate, the team not only work as data engineers in their day job, but they've put this bootcamp together to share their knowledge and grow the industry. Learning from such passionate teachers is an inspiration for any student. Furthermore, learning alongside students who volunteer to give up their evenings and weekends has allowed me to connect and network with eager to learn like minded industry professionals, which is invaluable in one's career.

Luke Huntley, Data Engineer at Western Power

I wanted a course that could give me a comprehensive run through of what the landscape is like. [...] I think the boot camp is really great for those who are not yet confident about their skills, and for those who like to learn by doing things in a structured and guided matter with more finesse and detail than what is normally available.

Nicholas Tantra, Web Systems Analyst at DMIRS

The best thing about Data Engineering Camp was the dedication of the teaching staff to ensure concepts, skills and tools were understood by the cohort. This was not always simple, especially since there are a wide variety of complex tools in the market, and naturally students have varying levels of knowledge and familiarity with these tools. I would strongly recommend the Data Engineering Camp to those looking to break into Data Engineering.
Anoop Ananth
Senior Data Engineer, Data Driven

Enquire or download brochure

  Get our bootcamp brochure 

  Get our curriculum week by week
 
  Get our pricing information
 
  Speak to our enrolments team 

Frequently asked questions

Bootcamp students have to be comfortable with basic Python and SQL programming concepts since the bootcamp is fast paced and we cover a lot of ground. It is also recommended that candidates have at least 1 year of working experience before enrolling. 

Computer requirements  

  • Apple Macbooks, running macOS Catalina and above.
  • Windows PC, running Windows 10 OS and above. 

Minimum hardware requirements

  • 16 GB of RAM
  • i5 CPU
  • 50 GB of HDD free space 
  • There are 9 hours of class time each week (lectures, hands-on labs, group projects).
  • We provide 2 hours of optional support hours. 
  • Students typically spend anywhere between 3 to 8 hours of outside class hours to work on projects or revise topics. 
  • Therefore we recommend budgeting between 12 to 18 hours per week when enrolling in this bootcamp. 

Yes, to request for reimbursement, you can make a copy of our reimbursement template and send it to your manager. 

Yes, students that complete 2 out of 3 projects with a passing grade will receive a certificate of completion.  

The course is delivered virtually through Zoom for flexibility of our students. Our Zoom class consist of live lectures and hands-on labs with instructor guidance. Slack is used for student and instructor communications.  

All classes are recorded in the event you are not able to make it to class.