A modern data platform command center with pipeline topology, dashboards, and observability panels.

Production-shaped data engineering portfolio

Ingest to Insight

A full platform case study for how I design trusted data systems: source ownership, CDC, governed batch and streaming, warehouse publishing, dbt Mesh, observability, and safe AI access.

14 Implemented platform capabilities
13+ Dataset contracts across batch and mesh products
7 Operational surfaces from API to SLOs

Why this is attractive to senior readers

It demonstrates platform judgment, not just tool familiarity.

The repo shows how data reliability, governance, developer ergonomics, and consumer trust fit together in a system that looks like something a team could operate.

Reliability

Idempotent, auditable data movement

Atomic object layouts, manifest state transitions, source-target reconciliation, commit markers, and repair paths protect the batch lane from duplicates and partial publishes.

Governance

Controls where risky operations happen

Dataset pause switches, approvals, maintenance windows, break-glass overrides, lineage, contracts, and audit tables turn governance into runtime behavior.

Scale

Metadata-driven platform boundaries

Registry-backed discovery and orchestrator adapters let new pipelines and data products be added through metadata instead of hardcoded API changes.

Business use

Trusted analytics and AI access

Postgres-first dbt Mesh marts, public model contracts, certified products, semantic metadata, and a read-only SQL assistant show the path from engineering controls to useful decisions.

Architecture

From operational systems to governed data products.

Retail, banking, and commerce sources feed CDC, batch ingestion, object storage, Spark processing, warehouse serving schemas, dbt Mesh marts, BI, governance APIs, and an AI assistant.

Sources
MySQL OLTP
Retail and bank source apps
Commerce REST API
Capture
Debezium CDC
Kafka and Schema Registry
API event poller
Platform runtime
MinIO object lake
Airflow orchestration
Spark batch and streaming
Serving
Postgres warehouse
Snowflake adapter path
dbt Mesh marts
Consumers
BI dashboards
Governance API
AI warehouse assistant

Clear ownership boundaries

Source systems own writes, Debezium captures changes, Airflow and Spark materialize data, dbt owns public analytical contracts, and the control plane exposes metadata without duplicating orchestration logic.

Portable deployment thinking

Compose, Helm, Kustomize, and Terraform assets show how the same platform concerns move from local development into cloud or Kubernetes environments.

Component map

Every layer has a job, contract, and audience.

01

Source applications

Retail, bank, and commerce services model application-owned writes, protected business endpoints, metrics, and outbox events.

sources/, oltp/mysql/
02

CDC and streaming

Debezium, Kafka, Schema Registry, and Spark streaming separate raw row capture from business events and downstream readiness.

platform/schema-registry/, pipelines/spark/
03

Batch runtime

Airflow DAGs and runtime plugins handle watermarks, late data, backfills, contracts, manifests, and atomic publish repair.

pipelines/airflow/
04

Object lake and warehouse

MinIO layouts, Postgres serving schemas, and Snowflake-ready DDL demonstrate the storage and serving split.

warehouse/, docker-compose.yaml
05

Contracts and registry

JSON contracts and Git-versioned registry files define schema, SLA, owners, sensitivity, certification, and discovery.

contracts/v1/, registry/
06

.NET control plane

A metadata-backed API exposes data products, pipelines, health, lineage, alerts, approvals, and workflow delegation.

apps/api/
07

dbt Mesh

Domain-owned marts, public model access, source freshness, semantic assets, exposures, and certification evidence sit after the governed serving layer.

warehouse/dbt/
08

AI warehouse assistant

pgvector metadata retrieval, deterministic SQL templates, query validation, read-only execution, JWT roles, audit, and feedback loops.

apps/ai-assistant/

Production signals

What the implementation makes visible.

The most valuable signal is not the list of tools. It is the way failure modes, review paths, deployment boundaries, and user trust are designed into the platform.

Reliability loop

  1. Contracts validate rows and split accepted, warning, and rejected records.
  2. Atomic object layouts publish with manifests, checksums, and commit markers.
  3. Readiness sensors gate downstream warehouse and dbt work.
  4. Reconciliation compares source and target rows, hashes, and sums.

Operations loop

  1. Freshness SLAs and SLO burn-rate rules expose data product health.
  2. Prometheus, Alertmanager, Grafana, Jaeger, and ELK cover metrics, alerts, traces, and logs.
  3. Backfill previews estimate rows, runtime, cost, and overwrite risk before execution.
  4. Restore drills rehearse Postgres, MinIO, Kafka replay, and dbt rebuild validation.

Governance loop

  1. Dataset controls handle pause, approval, maintenance, concurrency, and emergency override.
  2. Lineage payloads publish source, object, target, contract, quality, and reconciliation context.
  3. Certified dbt products expose owners, SLAs, sensitivity, and reviewer metadata.
  4. AI-assisted access stays inside approved marts with transparent generated SQL.

Engineering signal

How this work translates across data, platform, and AI teams.

This project is built to make engineering judgment visible: deciding where contracts live, how runtime controls should be enforced, how observability should map to ownership, and how AI can be introduced without bypassing warehouse governance.

Lead production-minded delivery

Translate ambiguous data platform goals into maintainable services, runtime contracts, validation gates, and deployable assets.

Bridge engineering and analytics

Connect ingestion, CDC, warehouse design, dbt Mesh ownership, BI, and AI-assisted discovery into one operating model.

Design for trust

Make freshness, lineage, approvals, reconciliation, audit, and recovery visible before stakeholders have to ask for them.