Icon
EDUCATING DATA ENGINEERS OF THE FUTURE

Australia's Leading Data Engineering Bootcamp

By Australia’s most reviewed Data Engineering Instructor, Jonathan Neo

14-weeks, Part-time

Learn the skills you will need to become a data engineer with the modern data stack in 14 weeks through part-time classes to adapt to your busy schedule.

Live Virtual Classroom

Learn from anywhere that is convenient for you. Our Zoom class consist of live lectures and hands-on labs with instructor guidance.

Next cohort commences January 2023

Become a highly effective data engineer with the modern data stack in 14 weeks. 

ProfilePicture

I have trained more than 100+ data professionals

Jonathan Neo – Data Engineer & Bootcamp Instructor 

Jonathan is a Data Engineer at Canva where he is building data platforms to empower people to unlock insights from 75 million monthly active users. He has previously worked at EY, Telstra Purple, and Mantel Group, where he has led data engineering teams, built data engineering platforms for ASX-100 customers, and developed new products and businesses.
Since 2020, Jonathan has trained over 100 students through data analytics bootcamps and courses. He also hosts the Perth Data Engineering monthly meetup group with over 300 members.

0
Years of Experience
51

Students

The lack of data engineers in a booming sector

More companies are becoming data driven during a time of talent shortage in data engineering  

Data engineers are data professionals that specialise in the movement and enrichment of data for it to be used effectively by others in the organisation. 

With 2.5 quintillion bytes of data being produced every day, companies are looking to data engineers to harness the data and make it usable. However, there is a shortage of data engineers today and the demand for data engineers have outstripped the supply. 

In fact, the ratio of data engineers to job postings is one of the lowest when compared to other technology roles, validating the sentiment felt by many companies that hiring a data engineer is like finding a needle in the haystack. 

This is good news for professionals looking to transition to data engineering. Salaries for data engineers have exceeded that of their counterparts. 

Market opportunity

15:1

Software developer

There are 15 Software Developers for every 1 Software Developer job. The average annual salary for Software Developer jobs in Australia ranges from $80k to $120k, with salaries going as high as $160k

5:1

Data analyst

There are 5 Data Analysts for every 1 Data Analyst job. The average annual salary for Data Analyst jobs in Australia ranges from $70k to $120k, with salaries going as high as $150k

2:1

Data scientist

There are 2 Data Scientists for every 1 Data Scientist job. The average annual salary for Data Scientist jobs in Australia ranges from $120k to $140k, with salaries going as high as $180k

1:1

Data engineer

There is only 1 Data Engineer for every 1 Data Engineer job. Every Data Engineer would need to leave their current job to fulfil the demand for Data Engineers. The average annual salary for Data Engineer jobs in Australia ranges from $130k to $150k, with salaries going as high as $180k

Curriculum

Master the skills to become a highly effective data engineer with the modern data stack in 14 weeks. 

Data architectures
Go behind the curtain and understand data architectures and jargons like Online Transactional Processing (OLTP) vs Online Analytics Processing (OLAP), data warehouses, data lakes, data lakehouses, and data meshes. Understand the role of a data engineer in the organisation and in a data team. 

Python for data engineering
Learn how to perform Extract Transform Load (ETL) with Python using popular libraries like Pandas and SQLAlchemy. Build your own connectors to extract data from APIs, files and databases. Transform data using Pandas. Build your own connectors to load data into data warehouses using SQLAlchemy. Automate pipelines to run on a time-based schedule using cron expressions. 

SQL for data engineering 
Learn about the Extract Load Transform (ELT) paradigm. Build ELT pipelines using Python and SQL. 

Unit testing 
Learn how to write unit tests using PyTest to validate your Python code. 

Docker for data engineering
Learn how to use Docker to deploy your ETL/ELT pipelines. 

Introduction to cloud computing
Learn about cloud computing and how to host storage, compute and database resources on Amazon Web Services (AWS).

Git for data engineering 
Learn how to use git, a popular source control management tool, to manage and track changes to code changes. Learn about git best practices like branching, pull requests and resolving merge conflicts. Learn how to use GitHub as your source control provider. 

Writing documentation 
Writing good documentation is both an art and a skill that is important to have as it helps other data engineers use your code and extend on it. Here, you’ll learn how to write good documentation using markdown and code commenting. 

Project 1
Apply what you have learnt on a group project. Create automated ETL/ELT pipelines that take raw data (from APIs, files and databases), perform data transformation and enrichment, and load data into conformed schemas. Write unit and integration tests to validate the solution. Learn to use git in your team, apply branching strategies and resolve merge conflicts. Deploy your solution locally or on the cloud. 

Airbyte for data integration 
Learn how to use Airbyte, a popular open-source data integration tool, to simply the process of integrating data between many sources to many destinations. Build your own custom Airbyte connectors to integrate custom data sources that are unique to your use-case. Package your custom connector using Docker. Deploy Airbyte locally on your machine, and on AWS to leverage the power of cloud computing. 

Databricks and Spark – Data lakehouse pattern for data transformation
Learn how to perform ETL/ELT on very large datasets using Databricks and Spark, a high performance computing tool and engine. Write the data into data lakes (AWS S3 buckets) using Spark. Understand the Spark architecture and query execution plans. Learn how to build robust and reliable data pipelines by perform data quality tests using great_expectations, a popular data quality test library. 

Snowflake and dbt – Data warehouse pattern for data transformation
Learn how to perform ELT/ELT on big data using Snowflake, a Massively Parallel Processing (MPP) database, and Data Build Tool (dbt). Learn how to create Directed Acyclic Graphs (DAGs) using dbt to manage dependencies between transformation tasks. Also learn how build your own custom functions using dbt macros, and create robust and reliable data pipelines using dbt data quality tests. 

Project 2
Apply what you have learnt on a group project. Create automated ETL/ELT pipelines that take raw data from big datasets using Airbyte, perform data transformation and enrichment using either the data lakehouse (Databricks) or data warehouse pattern (Snowflake), and load data into conformed schemas. Write data quality tests (great_expectations or dbt) to validate the integrity of the data. Deploy your solution locally or on the cloud. 

Airflow – Orchestrate data pipelines 
Create data orchestration pipelines using Airflow, a popular open-source data orchestration and scheduling tool. Build in data quality tests as part of the Airflow pipeline. Deploy Airflow locally and on AWS to leverage the power of cloud computing. 

Airflow – ELT pipelines 
Use Airflow to integrate with Airbyte, Databricks and Snowflake in building ELT pipelines.

Data modelling
Learn about different data modelling techniques like dimensional modelling (star schema vs snowflake schema), and data vault. Learn how to create dimensional models using dbt on Snowflake to make the data easier analyse. 

Data visualisation 
Connect to your dimensional models in Snowflake using Redash, a popular open-source data visualisation tool, and create dashboards to generate insights from the data. Deploy Redash locally and on AWS to leverage the power of cloud computing. 

Kafka – Producing and consuming data streams 
Learn how to use Kafka to produce and consume data streams. Connect custom applications to consume from Kafka streams and visualise real-time data. Deploy Kafka locally and on AWS to leverage the power of cloud computing.

kSQL – Transforming data streams 
Learn how to transform and aggregate Kafka streams using kSQL.

Druid – Analysing data streams
Read data from Kafka streams into Druid, a real-time database optimised for performing analytics. Create analytical SQL queries in Druid, and connect to Druid from Redash and build dashboards. Deploy Druid locally and on AWS to leverage the power of cloud computing.

GitHub Actions – CI/CD Pipelines
Learn how to create Continuous Integration (CI) and Continuous Deployment (CD) pipelines using GitHub Actions, a popular CI/CD tool, to automate deployment of code to a target environment. 

Career guidance 
Learn how to build a strong data engineering portfolio and online profile (GitHub and LinkedIn), and tips for landing your first job in data engineering. 

Capstone project 

The capstone project is used to showcase all the skills and technologies you have learnt throughout the bootcamp to potential employers. Your project should be deployable locally or on the cloud so that users (employers) can interact with your solution and look through your code on GitHub. 

Option A: Batch ETL solution 
Combine all the techniques and technologies you have learnt to create a batch ETL solution using Airflow, Airbyte, Databricks or Snowflake, dimensional modelling and Redash. Host your solution locally or on AWS. Deploy your solution using CI/CD pipelines from GitHub Actions.

Option B: Streaming ETL solution  
Combine all the techniques and technologies you have learnt to create a streaming ETL solution using Kafka, Druid and Redash. Host your solution locally or on AWS. Deploy your solution using CI/CD pipelines from GitHub Actions.

Course wrap-up
You have made it to the end of the bootcamp! 

You’ve done an amazing job to get here! It’s time to showcase your capstone project and the skills you have learnt to industry professionals. This is a great way to network and receive mentorship from other professionals who are in the field of data engineering. 

Technologies covered

Delivery

Online
The course is delivered online through Zoom for flexibility and the COVID safety of our participants and instructors. Our Zoom class consist of live lectures and hands-on labs with instructor guidance. Slack is used for student and instructor communications.  

Outside of working hours
The course is delivered between 6:00pm to 9:00pm (AWST) / 8:00pm to 11:00pm (AEST)  Monday, Tuesday, and Thursday, so as to not interfere with your day job. 
In addition, we provide optional support hours between 8:00am to 10:00am (AWST) / 10:00am to 12:00pm (AEST) on Saturday. Participants are encouraged to bring their questions and code problems.

Admissions

Process

Prerequisites

  • Python – you will need to be comfortable with variables, lists, dictionaries, functions, loops, conditionals, and using libraries.   
  • SQL – you will need to be comfortable with Data Manipulation Language (DML) SQL queries (select, group by, where, having, insert, delete, update), Data Definition Language (DDL) SQL queries (create table, alter table). 

After chatting with our admissions team, you will receive a 15-minute quiz to validate your knowledge on Python and SQL.

Learn the skills to become a data engineer now

Submit an enquiry via the enquiry form. Next cohort commences January 2023. 

FAQs

Most frequently asked questions and answers

We require candidates to be comfortable with basic Python and SQL programming since the bootcamp is fast paced and we cover a lot of ground. It is recommended that candidates have at least 1 year of experience in a technology related role (e.g. data analyst, database administrator, software developer, IT support engineer, system engineer). 

Computer requirements  

  • Apple Macbooks, running macOS Catalina and above.
  • Windows PC, running Windows 10 OS and above. 

Minimum hardware requirements

  • 8 GB of RAM
  • i5 CPU
  • 50 GB of HDD free space 
  • There are 9 hours of class time each week (lectures, hands-on labs, group projects).
  • We provide 2 hours of optional support hours. 
  • Students typically spend anywhere between 3 to 8 hours of outside class hours to work on projects or revise topics. 
  • Therefore we recommend budgeting between 12 to 18 hours per week when enrolling in this bootcamp. 

Yes, we offer the data engineering bootcamp curriculum to companies that are looking to either train data engineers, or transition their existing technology team to data engineers. See here for full details. 

Still have questions?