Data Engineer | AWS Certified | ETL Pipelines & Data Warehousing | Python, SQL | AWS (S3, Glue, Lambda)
AWS Certified Data Engineer with hands-on experience building data pipelines, cloud-based data solutions, and analytical systems using Python, SQL, and AWS.
I'm an AWS Certified Data Engineer based in Durban, South Africa โ currently working at Africa Health Research Institute (AHRI), where I design and maintain production-grade ETL pipelines, Power BI dashboards, and clinical data systems that support real-world health research across Africa.
AWS My work spans the full data lifecycle โ from ingesting raw data and designing cloud-based data solutions on AWS, to transforming, validating, and delivering data for reporting and analytics. I focus on building reliable, scalable systems that turn complex, messy data into meaningful insights.
AWS I have hands-on experience with Python, SQL, and AWS services including S3, Lambda, Glue, and RDS, applied in both professional and project-based environments.
I continuously strengthen my data engineering skills through hands-on projects, focusing on data pipelines, cloud architectures, and scalable data systems, and exploring areas such as real-time data pipelines and AWS-based data platforms..
๐ Location
Durban, KwaZulu-Natal, South Africa
๐ผ Current Role
Data Manager ยท Africa Health Research Institute
๐ Education
PGDip Computer Science ยท Tshwane University of Technology
๐ Certifications
AWS Cloud Practitioner ยท AWS Data Engineer Associate
Building scalable cloud infrastructure using AWS Free Tier services for real-world data engineering workloads.
Writing production-grade Python for data ingestion, transformation, and automated quality validation pipelines.
Designing and deploying automated DAGs with Apache Airflow for scheduled, monitored, and reliable data pipelines.
Implementing Medallion Architecture (Raw โ Silver โ Gold) for clean, scalable, and queryable data lake designs.
Querying and analysing large datasets using SQL on Athena and Python for delivery, sales, and e-commerce insights.
Containerising services with Docker Compose and managing code with Git and GitHub for version control and CI/CD.
Production-grade data engineering pipeline on AWS Free Tier. Ingests e-commerce data from Fake Store API, transforms JSON to Parquet via Medallion Architecture, catalogs with Glue, queries with Athena โ fully orchestrated by Apache Airflow DAGs running in Docker.
Full delivery data pipeline project โ ingesting, processing, and analysing delivery data through an automated data engineering workflow built for real-world logistics use cases.
In-depth analysis of delivery data โ exploring patterns, performance metrics, and operational insights using Python and data analysis techniques to drive business decisions.
Designed and implemented a sales data warehouse โ structured for efficient querying and reporting on sales performance, trends, and KPIs using modern data warehousing principles.
Analysis of facility visit data โ uncovering usage patterns, peak times, and operational insights to support data-driven facility management and resource planning decisions.
Cloud infrastructure project deploying a Node.js application on AWS EC2 with Nginx as a reverse proxy โ demonstrating cloud deployment, server configuration, and DevOps skills.
Studying ETL design principles and data warehouse architecture patterns based on The Data Warehouse ETL Toolkit by Ralph Kimball & Joe Caserta. Focus areas include ETL design patterns, dimensional modeling foundations, and best practices for building scalable data pipelines โ applied alongside Python self-study.
Hands-on Python development focused on building automation scripts and data processing solutions through daily structured practice. Focus areas include Python fundamentals, file handling and data transformation, automation workflows, and clean and maintainable code practices. All exercises documented on GitHub.
Self-directed learning grounded in core principles of modern data systems, translating theory into practical implementations through hands-on exercises. Focus areas include data pipeline design, ETL processes, data architecture fundamentals, and end-to-end data flow understanding. Progress documented on GitHub through small projects that bridge concept and practice.
End-to-end data lake ingesting South African and African artist data from YouTube API, Last.fm, and MusicBrainz โ featuring automated ETL scripts and a live Power BI dashboard.
Africa Health Research Institute (AHRI) ยท Full-time
Durban, KwaZulu-Natal ยท Hybrid
CSG ยท Internship
Centurion, Gauteng ยท Hybrid
Amazon Web Services
Data Engineering Certification
Computer Science
Computer Science
Computer Science