Learn from leading industry experts working at the world's most innovative companies
16 weeks part-time classes that adapt to your busy schedule
Learn from anywhere in our virtual classroom with live lectures and hands-on labs
Jonathan is a Data Engineer at Canva where he is building data platforms to empower teams to unlock insights to their products. He has previously worked at EY, Telstra Purple, and Mantel Group, where he has led data engineering teams, built nearly a dozen data platforms, and developed new products and offerings. Jonathan has taught over a hundred data professionals who are now working at leading technology companies around the world.
Hengji is a Data Engineer at Canva where he manages data architecture and pipelines for his group. He previously worked at Servian, a boutique consultancy focused on data solutions, before joining the Macquarie Group as an internal facing data consultant. Hengji’s extensive consulting experience has made him an excellent listener and explainer.
Pavan is a Data Engineer at Deloitte and brings software engineering rigour to data problems for his clients. He has previously worked at Commonwealth Bank as a Software Engineer, before joining Tyro payments where he implemented streaming pipelines using Kafka. Pavan is passionate about helping students achieve their “aha” moment by making complex subjects simple.
Master the skills to become an effective data engineer with the modern data stack in 16 weeks.
Master the core concepts and primitives used in data engineering around ETL. Abstractions and tools such as Airflow and Airbyte are built on top of the core concepts and primitives taught in this topic.
Continue down the path of mastering core concepts and primitives used in data engineering. In this topic, we learn the ELT pattern, a fairly recent addition to data engineering that was born from the recent explosion of cloud adoption.
Master the concepts to containerize, build, and deploy ETL pipelines into a production environment hosted on the cloud. Enable code versioning and team collaboration best practices through Git.
Create an end-to-end ETL pipeline using extract, load, and transform patterns covered in earlier topics.
Most businesses generate and store data in multiple systems such as Customer Relationship Management (CRM) systems, Order Management Systems (OMS), Accounting systems, Marketing platforms, and many more. Handcrafting the Extract and Load logic for each system is a tedious process that can be automated using data integration tools such as Airbyte.
As businesses scale, their data volumes increase until it is no longer viable to process data on a single compute instance. To solve scale issues, we look at technologies like Snowflake which are designed to perform analytical processing on large volumes of data.
We streamline our Transform pipeline development process using dbt, which popularized a subfield of data engineering known as Analytics Engineering.
Data engineering does not exist in a vacuum. As data engineers, we transform and model data for our end user consumption to power use-cases such as machine learning, business intelligence, and analytics. To model the data with software engineering principles of modularity and reusability, we apply data modelling techniques such as dimensional modelling. To enable end users to slice and dice models that we produce, we provide a layer on top of the data warehouse known as the semantic layer.
Spark is a distributed data processing system capable of processing large volumes of data. Databricks provides an ecosystem of tooling to enable Spark to run for multiple use-cases such as data engineering, stream processing, and machine learning.
We discover how Spark uses the separation of storage and compute to enable scale, learn about the delta file format, use Spark for data engineering, and apply data quality tests using Great Expectations.
Create an end-to-end ETL pipeline capable of processing large volumes of data.
Data orchestration enables data engineers to stitch together different parts of ETL into a single cohesive pipeline. Data orchestration makes it easy to trigger, schedule, monitor, and configure alerts for the pipelines. Data orchestrators like Airflow come with plugins or providers to connect existing tools in your data stack like Airbyte, dbt, Snowflake and Databricks, so that you can easily orchestrate steps between them.
Enable real-time insights from fast moving data. Learn the core concepts and primitives of stream processing using Kafka, and deploy kafka topics on Confluent Cloud. Integrate real-time events into Clickhouse, a real-time database, and perform data transformation in Clickhouse. Define and test Clickhouse materialized views using dbt.
As the data engineering team grows, so does the code complexity. To provide assurance that data engineers are doing the right things, automated code integration pipelines can be used to test and verify a data engineer’s code changes in a separate branch-based environment. After code has been validated, code can be automatically built and released into various deployment environments such as staging and production.
Showcase all the skills and technologies you have learnt throughout the bootcamp to future employers. Implement either a lambda or kappa architecture with ETL pipelines capable of processing large volumes of data.
Python – You are comfortable with variables, lists, dictionaries, functions, loops, conditionals, and using Python libraries.
if understand:
print("You understand the basics")
else:
print("Take some time to learn the basics")
SQL – You are comfortable with Data Manipulation Language (DML) such as select, group by, where, having, insert, delete, update
. You are comfortable with Data Definition Language (DDL) such as create table, alter table
.
CASE
WHEN understand=TRUE THEN 'You understand the basics'
ELSE 'Take some time to learn the basics'
END;
Graduate with a portfolio of professional projects that you can showcase to the world. Take a look at some projects below from our most recent cohort.
Kickstart your career in data engineering by presenting your capstone project to Data Engineers and representatives working at startups and large companies.
A batch and streaming data pipeline that analyzes music streaming service event data. Built using Kafka, Clickhouse, Snowflake, dbt, Preset, AWS, Azure, Docker, and GitHub Actions.
Paul Hallaste, Analytics Lead at Fidelity International
Alexander Potts, Data Engineer at Endeavour Group
Receive guidance on your career trajectory, resume review, and preparation for interviews.
Receive expert advice from data engineering practitioners about industry trends, technology stack tradeoffs, and professional development.
#help
and your peers or instructors will provide answers
Join our alumni community slack channel and stay in touch with your peers.
Attend alumni-only events and network with other data engineers in the industry.
I would recommend the bootcamp to two types of people. First, people who are interested in a career as a data engineer. The course teaches you the fundamentals for building a data stack using modern data tooling in hands on, steady paced, rigorous manner. Second, people who work with data engineers and are interested in understanding the modern data stack and how different parts of the data engineering lifecycle work together. Within weeks, I was able to set up a Kafka pipeline, use dbt to transform data, and set up a CI/CD pipeline via GitHub Actions.
Paul Hallaste, Analytics Lead at Fidelity International
If you're looking to break into the industry, I think this bootcamp is probably your best choice. It will help you to have a good understanding of how to combine all the tools together into the single pipeline. Using high and low level programming, build custom drivers and utilize the pre built ones, build the portfolio to showcase to the potential employers. It will also provide interview tips, in addition to the opportunity to introduce your project to multiple companies at the end of the bootcamp. If you're looking for a career change, welcome to the opportunity.
Mantas Liutkus, Data and Automation Engineer at M Solutions Corp
I would 100% recommend Data Engineer Camp to both newbies to the discipline looking to change their careers, and seasoned pros looking to round out their knowledge. I learned something brand new, or went deeper on topics I already knew almost every single week. The value of that really can't be understated. The best thing about the bootcamp is definitely the level of detail. The bootcamp really went deep into spark internals, which I was really impressed by, and that level of detail was maintained across every single topic.
Alexander Potts, Data Engineer at Endeavour Group
The best thing about bootcamp are the people. But to elaborate, the team not only work as data engineers in their day job, but they've put this bootcamp together to share their knowledge and grow the industry. Learning from such passionate teachers is an inspiration for any student. Furthermore, learning alongside students who volunteer to give up their evenings and weekends has allowed me to connect and network with eager to learn like minded industry professionals, which is invaluable in one's career.
Luke Huntley, Data Engineer at Western Power
I wanted a course that could give me a comprehensive run through of what the landscape is like. [...] I think the boot camp is really great for those who are not yet confident about their skills, and for those who like to learn by doing things in a structured and guided matter with more finesse and detail than what is normally available.
Nicholas Tantra, Web Systems Analyst at DMIRS
10:00am – 1:00pm (UTC)
10:00am – 1:00pm (UTC)
Bootcamp students have to be comfortable with basic Python and SQL programming concepts since the bootcamp is fast paced and we cover a lot of ground. It is also recommended that candidates have at least 1 year of working experience before enrolling.
Computer requirements
Minimum hardware requirements
Yes, to request for reimbursement, you can make a copy of our reimbursement template and send it to your manager.
Yes, students that complete 2 out of 3 projects with a passing grade will receive a certificate of completion.
The course is delivered virtually through Zoom for flexibility of our students. Our Zoom class consist of live lectures and hands-on labs with instructor guidance. Slack is used for student and instructor communications.
All classes are recorded in the event you are not able to make it to class.