Jabulani Mcineka — Data Engineer

Who I am

About Me

I'm an AWS Certified Data Engineer based in Durban, South Africa — currently working at Africa Health Research Institute (AHRI), where I design and maintain production-grade ETL pipelines, Power BI dashboards, and clinical data systems that support real-world health research across Africa.

AWS My work spans the full data lifecycle — from ingesting raw data and designing cloud-based data solutions on AWS, to transforming, validating, and delivering data for reporting and analytics. I focus on building reliable, scalable systems that turn complex, messy data into meaningful insights.

AWS I have hands-on experience with Python, SQL, and AWS services including S3, Lambda, Glue, and RDS, applied in both professional and project-based environments.

I continuously strengthen my data engineering skills through hands-on projects, focusing on data pipelines, cloud architectures, and scalable data systems, and exploring areas such as real-time data pipelines and AWS-based data platforms..

📍 Location

Durban, KwaZulu-Natal, South Africa

💼 Current Role

Data Manager · Africa Health Research Institute

🎓 Education

PGDip Computer Science · Tshwane University of Technology

🏅 Certifications

AWS Cloud Practitioner · AWS Data Engineer Associate

View Projects → GitHub Profile LinkedIn

What I work with

Skills & Technologies

☁️

AWS Cloud

Building scalable cloud infrastructure using AWS Free Tier services for real-world data engineering workloads.

S3 Glue Athena IAM EC2

🐍

Python Engineering

Writing production-grade Python for data ingestion, transformation, and automated quality validation pipelines.

Python 3.12 Pandas PyArrow Boto3 Requests

🔄

Pipeline Orchestration

Designing and deploying automated DAGs with Apache Airflow for scheduled, monitored, and reliable data pipelines.

Apache Airflow Docker DAGs BashOperator

🏗️

Data Architecture

Implementing Medallion Architecture (Raw → Silver → Gold) for clean, scalable, and queryable data lake designs.

Medallion Architecture Parquet Data Lake ETL

📊

Data Analysis

Querying and analysing large datasets using SQL on Athena and Python for delivery, sales, and e-commerce insights.

SQL Athena Pandas Data Quality

🐳

DevOps & Tools

Containerising services with Docker Compose and managing code with Git and GitHub for version control and CI/CD.

Docker Docker Compose Git GitHub

What I've built

Projects

🌐Fake Store API

→

🪣S3 Raw (JSON)

→

⚙️Glue ETL

→

🪣S3 Silver (Parquet)

→

🔍Athena SQL

E-Commerce Real-Time Data Pipeline

Production-grade data engineering pipeline on AWS Free Tier. Ingests e-commerce data from Fake Store API, transforms JSON to Parquet via Medallion Architecture, catalogs with Glue, queries with Athena — fully orchestrated by Apache Airflow DAGs running in Docker.

AWS S3 Glue Athena Airflow Docker Python

Delivery Pipeline

Full delivery data pipeline project — ingesting, processing, and analysing delivery data through an automated data engineering workflow built for real-world logistics use cases.

Python Data Pipeline ETL

Delivery Analysis

In-depth analysis of delivery data — exploring patterns, performance metrics, and operational insights using Python and data analysis techniques to drive business decisions.

Python Pandas Data Analysis

Sales Data Warehouse

Designed and implemented a sales data warehouse — structured for efficient querying and reporting on sales performance, trends, and KPIs using modern data warehousing principles.

SQL Data Warehouse Python

Facility Visits Analysis

Analysis of facility visit data — uncovering usage patterns, peak times, and operational insights to support data-driven facility management and resource planning decisions.

Python Data Analysis Pandas

AWS EC2 Node + Nginx Setup

Cloud infrastructure project deploying a Node.js application on AWS EC2 with Nginx as a reverse proxy — demonstrating cloud deployment, server configuration, and DevOps skills.

AWS EC2 Nginx Node.js Linux

In Progress

📘 Data Warehouse ETL Toolkit

Studying ETL design principles and data warehouse architecture patterns based on The Data Warehouse ETL Toolkit by Ralph Kimball & Joe Caserta. Focus areas include ETL design patterns, dimensional modeling foundations, and best practices for building scalable data pipelines — applied alongside Python self-study.

ETL DesignDimensional ModelingData WarehousePython

Apr 2026 – Present

In Progress

🐍 Python Automation & Data Projects — 100 Days of Code

Hands-on Python development focused on building automation scripts and data processing solutions through daily structured practice. Focus areas include Python fundamentals, file handling and data transformation, automation workflows, and clean and maintainable code practices. All exercises documented on GitHub.

PythonAutomationFile HandlingData Processing

Mar 2026 – Present

In Progress

📘 Data Engineering Self-Study — Fundamentals of Data Engineering

Self-directed learning grounded in core principles of modern data systems, translating theory into practical implementations through hands-on exercises. Focus areas include data pipeline design, ETL processes, data architecture fundamentals, and end-to-end data flow understanding. Progress documented on GitHub through small projects that bridge concept and practice.

Data PipelinesETLData ArchitecturePythonEnd-to-End Flows

Mar 2026 – Present

Where I've worked

Experience

Data Manager (Data Engineering)

Africa Health Research Institute (AHRI) · Full-time

Durban, KwaZulu-Natal · Hybrid

Jan 2023 – Present · 3 yrs

Designed and maintained ETL pipelines using Python and SQL to ingest and transform clinical and research datasets.
Automated data quality checks to ensure accuracy and integrity across multiple systems.
Optimised SQL queries to improve reporting performance and reduce execution time.
Developed Power BI dashboards supporting operational monitoring and research analytics.
Led clinical data migration project — consolidating data from multiple systems into a single database, standardising formats and improving reporting accuracy.
Documented pipeline architecture and workflow processes to improve maintainability.
Collaborated cross-functionally to ensure governed and reliable data access.

Python SQL ETL Power BI Data Governance

Software Developer (Internship)

CSG · Internship

Centurion, Gauteng · Hybrid

Jan 2022 – Jan 2023 · 1 yr

Contributed to enterprise-level software development in an agile environment.
Developed and maintained backend components using Python and SQL.
Participated in code reviews and worked with Git-based version control across collaborative development workflows.

Python SQL Git Agile

Jabulani
Mcineka

About Me

Skills & Technologies

AWS Cloud

Python Engineering

Pipeline Orchestration

Data Architecture

Data Analysis

DevOps & Tools

Projects

E-Commerce Real-Time Data Pipeline

Delivery Pipeline

Delivery Analysis

Sales Data Warehouse

Facility Visits Analysis

AWS EC2 Node + Nginx Setup

📘 Data Warehouse ETL Toolkit

🐍 Python Automation & Data Projects — 100 Days of Code

📘 Data Engineering Self-Study — Fundamentals of Data Engineering

Live Dashboard

Project Spotlight

SA Artists Multi-Source Data Lake

Experience

Data Manager (Data Engineering)

Software Developer (Internship)

Certifications & Education

AWS Certified Cloud Practitioner

Data Engineering Certified

Postgraduate Diploma

BTech in software Development

National Diploma in software Development

Let's Work Together