Transform your career trajectory with our Data Engineering Bootcamp

Learn from leading industry experts working at the world's most innovative companies

16 weeks part-time classes that adapt to your busy schedule

Learn from anywhere in our virtual classroom with live lectures and hands-on labs

Data Engineer Bootcamp instructors

September cohort in

Days
Hours
Minutes
Seconds
now

Limited Seats Remaining

Curriculum

Master the skills to become an effective data engineer with the modern data stack in 16 weeks. Take a comprehensive look at the curriculum here.

Master the core concepts and primitives used in data engineering around ETL. Abstractions and tools such as Airflow and Airbyte are built on top of the core concepts and primitives taught in this topic. 

  • Python virtual environments 
  • ETL with interactive Jupyter Notebooks
  • Data extraction patterns from APIs (full vs incremental) 
  • Data transformation using dataframes (Pandas) 
  • Data loading patterns to files (CSV, Parquet) and databases (SQLAlchemy, PostgreSQL) 
  • Functional Programming with Python 
  • Modular programming using Python modules
  • Object Oriented Programming with Python
  • Logging with Python (PostgreSQL) 
  • Unit and integration testing with Python 
  • Code linting with Python
  • Metadata config pipeline (YAML) 
  • Metadata logging to database 
  • Cron scheduling 

Continue down the path of mastering core concepts and primitives used in data engineering. In this topic, we learn the ELT pattern, a fairly recent addition to data engineering that was born from the recent explosion of cloud adoption. 

  • Data extraction patterns from databases (full vs incremental vs CDC) 
  • Data loading patterns to databases (overwrite vs insert vs upsert vs merge) 
  • SQL  Transformations in databases (PostgreSQL)
  • Jinja for SQL templating (Jinja, Python, SQLAlchemy)  
  • Directed acyclic graphs (DAGs) with Python 
  • SQL Common Table Expressions (CTEs) 
  • SQL Window Functions 
  • Modularising ELT pipelines 
  • Unit testing ELT pipelines 
  • Logging for ELT pipelines (PostgreSQL) 
  • Metadata config for ELT pipelines (YAML) 

Master the concepts to containerize, build, and deploy ETL pipelines into a production environment hosted on the cloud. Enable code versioning and team collaboration best practices through Git. 

  • Git:
    • Git version control system 
    • Git workflow (add, commit, push, merge, pull) 
    • Git branching 
    • Github pull requests 
  • Docker:
    • Computer vs Virtual Machine vs Docker 
    • Docker image and container 
    • Docker commands 
    • Dockerfile 
    • Docker volumes 
    • Docker compose
    • Docker repository 
    • Containerize an ETL pipeline 
  • Cloud (AWS): 
    • Identity and Access Management (IAM) – Users, Policies, Groups, and Roles 
    • Relational Database Service (RDS)
    • Simple Storage Service (S3) 
    • AWS CLI and Boto3 
    • Elastic Container Registry (ECR) 
    • Elastic Container Service (ECS) 
    • Deploy and schedule an ETL pipeline on ECS

Create an end-to-end ETL pipeline using extract, load, and transform patterns covered in earlier topics. 

  • Extract data from a data source of your choosing, load data into a data store of your choosing, and transform data to support a use-case of your choosing. 
  • Apply metadata logging, metadata configuration, unit and integration testing, and cron scheduling for your pipeline. 
  • Use git and github to apply code versioning, git branching, and pull requests. 
  • Containerize your ETL pipeline using docker, and deploy the pipeline on the cloud.

Most businesses generate and store data in multiple systems such as Customer Relationship Management (CRM) systems, Order Management Systems (OMS), Accounting systems, Marketing platforms, and many more. Handcrafting the Extract and Load logic for each system is a tedious process that can be automated using data integration tools such as Airbyte. 

  • Airbyte sources, destinations, and connections 
  • Airbyte extract patterns (full, incremental, CDC) 
  • Airbyte load patterns (overwrite, insert, upsert, merge) 
  • Octavia CLI 
  • Airbyte API 
  • Airbyte Custom Connectors 
  • Deploying airbyte on AWS EC2
  • End-to-end ELT pipeline with Airbyte on AWS

As businesses scale, their data volumes increase until it is no longer viable to process data on a single compute instance. To solve scale issues, we look at technologies like Snowflake which are designed to perform analytical processing on large volumes of data. 

We streamline our Transform pipeline development process using dbt, which popularized a subfield of data engineering known as Analytics Engineering. 

  • OLAP vs OLTP 
  • Snowflake architecture 
  • Snowflake RBAC 
  • Loading data into Snowflake 
  • Parsing JSON with Snowflake 
  • Snowflake micro-partitions and clustering
  • dbt project 
  • dbt commands (run, test, build, list)
  • Writing and running a dbt model
  • dbt seeds, tests, and macros
  • dbt docs
  • dbt in production (profiles, targets, and deploy on AWS)

Data engineering does not exist in a vacuum. As data engineers, we transform and model data for our end user consumption to power use-cases such as machine learning, business intelligence, and analytics. To model the data with software engineering principles of modularity and reusability, we apply data modelling techniques such as dimensional modelling. To enable end users to slice and dice models that we produce, we provide a layer on top of the data warehouse known as the semantic layer. 

  • Normalization vs Denormalization
  • Data modelling concepts: 
    • Dimensional modelling by Ralph Kimball (also known as Star Schema) 
    • Data warehouse modelling by Bill Inmon 
    • Data vault modelling by Dan Linstedt 
    • One Big Table (OBT) 
  • Applied dimensional modelling using dbt: 
    • Fact and dimension tables
    • dbt snapshots and Slowly Changing Dimensions (SCD) 
    • Transactional fact table 
    • Snapshot fact table 
    • Accumulating snapshot fact table 
    • Factless fact table 
    • Incremental fact table load 
  • Semantic modelling: 
    • Semantic modelling concepts and tools 
    • Semantic modelling and metrics using Preset 
    • Preset chart
    • Preset dashboard

Spark is a distributed data processing system capable of processing large volumes of data. Databricks provides an ecosystem of tooling to enable Spark to run for multiple use-cases such as data engineering, stream processing, and machine learning. 

We discover how Spark uses the separation of storage and compute to enable scale, learn about the delta file format, use Spark for data engineering, and apply data quality tests using Great Expectations. 

  • Big data processing architectures
  • Spark internals and core concepts 
  • Spark reading and writing 
  • Spark SQL 
  • Spark DataFrame 
  • Spark joins, group by, and aggregation 
  • Spark UDF 
  • Spark query plan and optimization 
  • Spark partition keys  
  • ACID file formats (delta file format) 
  • Data orchestration using Databricks Workflows
  • Manage the Databricks workspace using API and CLI 
  • Data quality testing with Great Expectations

Create an end-to-end ETL pipeline capable of processing large volumes of data. 

  • Extract data from a data source of your choosing, load data into a data store of your choosing, and transform data to support a use-case of your choosing. 
  • Use Airbyte to perform data integration, an abstraction layer over low-level Extract and Load patterns. 
  • Transform data using compute engines from the Data Warehousing paradigm (Snowflake), or the Data Lakehouse paradigm (Databricks). 
  • Apply transformation logic using traditional ETL patterns, or Analytics Engineering patterns using dbt. 
  • Apply metadata logging, metadata configuration, unit and integration testing, and cron scheduling for your pipeline. 
  • Use git and github to apply code versioning, git branching, and pull requests. 
  • Deploy your end-to-end ETL pipeline on the cloud.

 

Data orchestration enables data engineers to stitch together different parts of ETL into a single cohesive pipeline. Data orchestration makes it easy to trigger, schedule, monitor, and configure alerts for the pipelines. Data orchestrators like Airflow come with plugins or providers to connect existing tools in your data stack like Airbyte, dbt, Snowflake and Databricks, so that you can easily orchestrate steps between them. 

  • Data orchestration architecture and patterns 
  • Airflow DAGs, Operators and Tasks
  • Airflow schedule 
  • Airflow patterns for catchup, idempotence, backfill, and branching
  • Airflow variables, connections, hooks, and providers 
  • Airflow cross communication (XComs) 
  • Airflow sensors
  • Airflow Dynamic Task Mapping, and taskflow
  • Trigger Rules
  • Watcher Pattern
  • Deploy airflow locally, and on AWS using EC2 or MWAA 
  • Extending the Airflow Docker image
  • Airflow providers for Airbyte, dbt, Databricks, Snowflake, and Slack alerts

Enable real-time insights from fast moving data. Learn the core concepts and primitives of stream processing using Kafka, and deploy kafka topics on Confluent Cloud. Integrate real-time events into Clickhouse, a real-time database, and perform data transformation in Clickhouse. Define and test Clickhouse materialized views using dbt. 

  • Streaming concepts 
  • Kafka key concepts
  • Kafka broker and topics  
  • Kafka CLI
  • Creating a Python Kafka producer
  • Creating a Python Kafka consumer
  • Stream analytics using ksqlDB
  • Deploy kafka on Confluent Cloud 
  • Real-time databases with Clickhouse 
  • Clickhouse architecture and internals 
  • Use Kafka Connect to integrate data into Clickhouse
  • Tables, views, and materialized views on Clickhouse
  • dbt to test and deploy Clickhouse objects 
  • Real-time dashboard with Preset, Clickhouse, and Kafka

As the data engineering team grows, so does the code complexity. To provide assurance that data engineers are doing the right things, automated code integration pipelines can be used to test and verify a data engineer’s code changes in a separate branch-based environment. After code has been validated, code can be automatically built and released into various deployment environments such as staging and production. 

  • Principles of DataOps 
  • Continuous integration pipelines: 
    • Unit testing 
    • Code linting tests 
    • Data quality testing 
    • Branch-based testing environments 
    • CI pipelines for dbt 
    • CI pipelines for Python ETL 
  • Continuous deployment pipelines: 
    • Containerize and build 
    • Deployment environments 
    • Deploy using Infrastructure as Code (IaC)

Showcase all the skills and technologies you have learnt throughout the bootcamp to future employers. Implement either a lambda or kappa architecture with ETL pipelines capable of processing large volumes of data.

  • Extract data from a data source of your choosing, load data into a data store of your choosing, and transform data to support a use-case of your choosing. 
  • Apply unit testing, data quality testing, and monitoring over your ETL pipelines 
  • Use git and github to apply code versioning, git branching, and pull requests 
  • Implement CI pipelines to automatically test your code in a branch-based environment 
  • Implement CD pipelines to automatically deploy your end-to-end ETL pipeline on the cloud 
  • Present your capstone projects at the Demo Day to data engineering experts working in startups or large companies

Untitled design (11)

Real-world projects

Graduate with a portfolio of professional data engineering projects that you can showcase to the world. Take a look at some projects below from our most recent Data Engineering Camp cohort. 

Demo day

Kickstart your career in data engineering by presenting your capstone project to Data Engineers and representatives working at startups and large companies. 

Capstone project: Formula One

A batch data pipeline that analyses Formula One races using AWS, Airbyte, Databricks, Delta Lake, Dagster, and Preset.

Douglas Fugimoto, Senior Analytics Engineer at Toptal (Brazil)

Capstone project: Heart Rate Stream

A batch and streaming data pipeline that analyses real-time Heart Rate data using Kafka, Confluent, AWS, Airbyte, Snowflake, Dagster, dbt, and Tableau.

Madyar Marat, Operations Research Specialist at AHOY (Dubai)

Join our next data engineer bootcamp

April 2024 Cohort

  16 weeks, 29 April 2024 – 19 August 2024

  Monday, Tuesday, and Thursday
10:00am – 1:00pm (UTC)

  SOLD OUT

September 2024 Cohort

  16 weeks, 2 September 2024 – 23 December 2024

  Monday, Tuesday, and Thursday
10:00am – 1:00pm (UTC)

  30 Seats available

Our industry leading instructors

Chris Dilger

Chris Dilger, a seasoned Senior Engineer at Versent, a leading cloud consultancy, excels in building data pipelines with a focus on AWS services and Snowflake. Beyond his data engineering prowess, Chris seamlessly transitions between front and backend engineering roles, contributing to versatile solutions that bridge data architecture and application development. Formerly a Junior Consultant in Data at the Data Experience, Chris brings a wealth of experience to his current position. His dedication to crafting efficient data solutions, coupled with a passion for technology, positions him as a key contributor in the dynamic landscape of cloud consulting at Versent.

Jay Zern Ng

Jay is a Senior Data Engineer at Flatiron Health, a leading health tech company based in NYC, where he spearheads engineering for both custom and syndicated products. With an M.S. in Data Science from Columbia University, adding five years of formal training in Machine Learning, Jay brings a wealth of knowledge and experience to the Bootcamp. Beyond his professional commitments, Jay is passionate about education and empowerment in the tech community. He runs a popular YouTube channel with data engineering and lifestyle content, boasting 7,000 subscribers and 250,000 views (as of writing)! Check his YouTube handle out @jayzern

Rashid Mohammed

Rashid is a Data Engineer at MA Financial Group and a Data Engineering Consultant to Blike, a UK e-bike solution. He has a strong understanding of modern data platforms and experience building modern data pipelines. Rahisd comes from a financial & banking background with a history in the Microsoft suite helping to deploy Power Platform enterprise solutions. He’s a learner himself and gets immense satisfaction from teaching.

Jonathan Neo

Jonathan is a Data Engineer at Canva where he is building data platforms to empower teams to unlock insights to their products. He has previously worked at EY, Telstra Purple, and Mantel Group, where he has led data engineering teams, built nearly a dozen data platforms, and developed new products and offerings. Jonathan has taught over a hundred data professionals who are now working at leading technology companies around the world.

Prerequisites

Python – You are comfortable with variables, lists, dictionaries, functions, loops, conditionals, and using Python libraries. 

				
					if understand: 
    print("You understand the basics")
else:
    print("Take some time to learn the basics")
				
			

SQL – You are comfortable with Data Manipulation Language (DML) such as select, group by, where, having, insert, delete, update. You are comfortable with Data Definition Language (DDL) such as  create table, alter table

				
					CASE 
    WHEN understand=TRUE THEN 'You understand the basics' 
    ELSE 'Take some time to learn the basics'
END; 

				
			

Support

Career services

  1:1 coaching with a career coach

Receive guidance on your data engineer career trajectory, resume review, and preparation for interviews.  

  1:1 expert advice from practitioners 

Receive expert advice from data engineering practitioners about industry trends, technology stack tradeoffs, and professional development. 

Learning assistance

  Ask questions in the live-classes and office hours and your instructor will provide answers 

  Ask questions in the Slack channel #help and your peers or instructors will provide answers 

  Work on projects in a group and hold each other accountable

Alumni community

  Alumni slack channel 

Join our alumni community slack channel and stay in touch with your peers. 

  Alumni events 

Attend alumni-only events and network with other data engineers in the industry. 

Testimonials

I would recommend the bootcamp to two types of people. First, people who are interested in a career as a data engineer. The course teaches you the fundamentals for building a data stack using modern data tooling in hands on, steady paced, rigorous manner. Second, people who work with data engineers and are interested in understanding the modern data stack and how different parts of the data engineering lifecycle work together. Within weeks, I was able to set up a Kafka pipeline, use dbt to transform data, and set up a CI/CD pipeline via GitHub Actions.

Paul Hallaste, Analytics Lead at Fidelity International (Japan)

If you're looking to break into the industry, I think this bootcamp is probably your best choice. It will help you to have a good understanding of how to combine all the tools together into the single pipeline. Using high and low level programming, build custom drivers and utilize the pre built ones, build the portfolio to showcase to the potential employers. It will also provide interview tips, in addition to the opportunity to introduce your project to multiple companies at the end of the bootcamp. If you're looking for a career change, welcome to the opportunity.

Mantas Liutkus, Data and Automation Engineer at M Solutions Corp (Canada)

I would 100% recommend Data Engineer Camp to both newbies to the discipline looking to change their careers, and seasoned pros looking to round out their knowledge. I learned something brand new, or went deeper on topics I already knew almost every single week. The value of that really can't be understated. The best thing about the bootcamp is definitely the level of detail. The bootcamp really went deep into spark internals, which I was really impressed by, and that level of detail was maintained across every single topic.

Alexander Potts, Data Engineer at Endeavour Group (Australia)

The best thing about bootcamp are the people. But to elaborate, the team not only work as data engineers in their day job, but they've put this bootcamp together to share their knowledge and grow the industry. Learning from such passionate teachers is an inspiration for any student. Furthermore, learning alongside students who volunteer to give up their evenings and weekends has allowed me to connect and network with eager to learn like minded industry professionals, which is invaluable in one's career.

Luke Huntley, Data Engineer at Western Power (Australia)

I wanted a course that could give me a comprehensive run through of what the landscape is like. [...] I think the boot camp is really great for those who are not yet confident about their skills, and for those who like to learn by doing things in a structured and guided matter with more finesse and detail than what is normally available.

Nicholas Tantra, Web Systems Analyst at DMIRS (Thailand)

I can absolutely recommend the Data Engineering Bootcamp to everyone that needs to acquire the modern data engineering skills in a fast, digestible and reliable way. In 16 weeks you will receive a perfectly developed curriculum of the most important concepts and tools in data engineering. You receive all lessons in well-built units and chapters that develop over time, and the delivery of all learning materials comes with practical examples and training sessions inside and outside of class.

Dr. Gernot G. Supp, Digital Research, Data Science & Data Engineering (Germany)

 
  •  

Enquire or download brochure

  Get our bootcamp brochure 

  Get our curriculum week by week
 
  Get our pricing information
 
  Speak to our enrolments team 

Frequently asked questions

Bootcamp students have to be comfortable with basic Python and SQL programming concepts since the bootcamp is fast paced and we cover a lot of ground. It is also recommended that candidates have at least 1 year of working experience before enrolling. 

Computer requirements  

  • Apple Macbooks, running macOS Catalina and above.
  • Windows PC, running Windows 10 OS and above. 

Minimum hardware requirements

  • 16 GB of RAM
  • i5 CPU
  • 50 GB of HDD free space 
  • There are 9 hours of class time each week (lectures, hands-on labs, group projects).
  • We provide 2 hours of optional support hours. 
  • Students typically spend anywhere between 3 to 8 hours of outside class hours to work on projects or revise topics. 
  • Therefore we recommend budgeting between 12 to 18 hours per week when enrolling in this bootcamp. 

Yes, to request for reimbursement, you can make a copy of our reimbursement template and send it to your manager. 

Yes, students that complete 2 out of 3 projects with a passing grade will receive a certificate of completion.  

The course is delivered virtually through Zoom for flexibility of our students. Our Zoom class consist of live lectures and hands-on labs with instructor guidance. Slack is used for student and instructor communications.  

All classes are recorded in the event you are not able to make it to class. 

We cover topics such as network security and access control when provisioning access to the resources that we deploy on the cloud. We do not cover the topic of migration directly e.g. how to migrate an on-premise data platform to the cloud. However, we can discuss such topics during office hours.

Yes, the course covers DataOps principles such as Data Quality testing and monitoring, and Continuous Integration and Deployment.

We offer career services which include one-on-one mentoring, data engineering application creation tutorial, data engineering technical interview seminar and access to instructors for career guidance throughout the course. Unfortunately, we cannot promise a job but we can promise to build your data engineering skillset with a modern techstack and develop your career application.

Instructors are typically available during live classes, office hours, and via online communication platforms (Slack) for personalized guidance and support.

Instructors at our data engineering bootcamp usually have a mix of industry experience and teaching backgrounds. They may have taught at universities, conducted workshops, mentored junior engineers, or worked as trainers in their previous roles. Their diverse backgrounds ensure a comprehensive and practical learning experience for students.

Data engineering courses offer comprehensive, shorter-term education suitable for beginner experience levels, while data engineering bootcamps provide intensive, hands-on training focused on practical skills for rapid career entry or upskilling.