Our advanced data engineering course is intense. To be well-prepared and get the most out of the bootcamp, you must complete 40 hours of preparation work to develop your tech foundations.
What you will do in practice:
- Developer skill refresher: Linux, GitHub, and Git
- In-depth exploration of Python fundamentals
- Intermediate SQL refreshment
Software and languages you will learn:
40h
Build the foundation for data engineering
Kickstart your journey into Data Engineering with a deep dive into core concepts and tools, setting a strong foundation for your growth in this field from using Python and CI/CD best practices to learning Docker.
What you will do in practice:
- Set up your own virtual machine with Visual Studio Code
- Build your first data lake and implement data transformations with Python
- Apply CI/CD techniques using Ruff, Pylint, GitHub, and Poetry
- Deploy a FastAPI app into production using Docker
Software and languages you will learn:
40h
Create a data warehouse
Work on the central piece of your modern data stack: the data warehouse. Elevate your skills in SQL, Postgres and use BigQuery as a Data Engineer. Also, discover Docker Compose for handling multi-container Docker applications.
What you will do in practice:
- Create a data warehouse with BigQuery and set up access for your team
- Import data using advanced SQL skills, Fivetran & Airbyte
- Set up a Postgres instance entirely from scratch and compare that to managed solutions
- Utilize Docker Compose for local setup and testing of complex setups such as sharded databases
Software and languages you will learn:

BigQuery

SQL

Docker

PostgreSQL
40h
Organize your data for visualization
Deepen your understanding of ETL, ELT, and ETLT processes with Airflow and DBT. Prepare your Data for various data visualization tools and orchestrate your Docker-created containers with Kubernetes.
What you will do in practice:
- Implement and optimize ETL workflows using Airflow
- Build and manage data pipelines with DBT, with a focus on modularity, testing, and version control
- Combine Airflow and DBT together
- Get introduced to Kubernetes and how to deploy to a production cluster
Software and languages you will learn:

SQL

DBT

Kubernetes

Airflow
40h
Optimize data workloads of any size
Learn to manage larger workloads and data transfers, explore the realm of streaming data at scale, and grasp the essential aspects of logging and monitoring.
What you will do in practice:
- Leverage PySpark for transforming massive amounts of data
- Implement data streaming solutions with Kafka and Pub/Sub
- Transform streaming data in real-time with Apache Beam
- Learn how to manage and monitor your data solutions as your data workload increases
Software and languages you will learn:

Python
40h
Conduct a comprehensive project
Design and build a data engineering project from the ground up. Integrate a variety of solutions from the modern data stack. Deliver data to end users and deploy your projects into production.
What you will build in practice
- Data Engineering as a team: ADR process & Identity and Access Management (IAM)
- Use Terraform to create your infrastructure
- GraphDB pros & cons
- When to use Document DBs and Wide Column DBs
Apply the tools and technology acquired during the modules in practical situations
Start your career in tech!
From the moment you embark on your tech journey, our dedicated career services team is there to provide you with tailored guidance throughout and beyond your training. Connect with our vast network of 1000+ hiring partners, benefit from 1:1 coaching and much more!
What you will do in practice:
- Elevate your personal branding & get technical interview training
- Benefit from 1:1 coaching & key alumni Q&A sessions
- Connect with an extensive network of 1000+ hiring partners
- Get a lifetime access to Le Wagon content