← Back to cases
CASE 006  /  LOGISTICS · DATA OPS

A production data pipeline on Databricks turned a 5-day data-mess into hourly-fresh dashboards.

A logistics operator was generating millions of operational rows per month across 6 source systems, but their analysts couldn't trust any of it. We rebuilt the ingestion, cleanup and quality-monitoring layer on Databricks — with versioned data contracts, anomaly detection, and a Slack-first alerting system.

5d → 1hDATA FRESHNESS
90%INGESTION AUTOMATED
100%SLA MET
5 wkBUILD
Analytics dashboard on a laptop screen
LOGISTICS · MAR 2026 · OPERATIONAL DATA REDACTED
THE PROBLEM

Six data sources, three Excel masters of truth, zero confidence in any number.

The operator had a TMS, a fleet system, two customs platforms, a fuel-card feed, and a homegrown invoicing tool. Six sources, no shared keys, no shared schema. Their analytics team rebuilt the same joins every Monday in three different Excel files — and they regularly disagreed.

Executives had stopped asking for KPIs because the numbers kept changing between meetings. The CFO told us bluntly: "I don't trust any single dashboard in this company."

THE INVESTIGATION

We instrumented the existing pipelines for two weeks before drawing a single arrow.

Most data-platform engagements start with whiteboards. We started with a passive observability layer — measuring ingestion latency, error rates, and schema drift on the existing pipelines. After two weeks we had a real picture: 47% of ingestion jobs were failing silently, 22% of records had at least one quality issue, and the worst offender (the customs feed) was the one nobody wanted to touch.

We then designed the new architecture against the actual failure modes, not against a textbook lambda diagram.

"They didn't come in with a Databricks reference architecture. They came in with our actual error logs."
THE BUILD

Bronze → silver → gold on Delta, with Great Expectations gates between layers.

Standard medallion architecture, executed boringly well. Bronze layer ingests raw with full provenance. Silver layer applies business rules and conforms keys across systems. Gold layer is the analytics-ready aggregates.

The non-standard part was the gates: every layer-to-layer transition has Great Expectations checks. If quality crosses a defined threshold, the bad batch holds — it doesn't poison downstream tables. Slack alerts go to the responsible team within 5 minutes; severe issues page on-call.

THE PILOT

5 weeks. 5 million rows ingested per month. The CFO finally trusted a number.

Within 5 weeks the new pipeline was the source of truth for 4 of 6 source systems. Data freshness moved from 5-day weekly batch to hourly. 90% of ingestion was fully automated; the remaining 10% (the customs feed) had a documented manual escape hatch with a clear SLA.

The most visible change was cultural: in the third executive review meeting after go-live, two department heads pulled up the same dashboard and got the same number. That hadn't happened in two years.

THE OUTCOME

Numbers that survived go-live.

  • Data freshness5 days → 1 hour−97%
  • Ingestion automated53% → 90%+37pt
  • Records with quality issues22% → 1.4%−94%
  • Time to debug a wrong number~3 days → 20 min−99%
  • Conflicting executive numbersweekly → 0
YOUR DATA PLATFORM

Numbers don't agree across departments? Data freshness measured in days?

20-min audit. We look at your top 3 dashboards and find the lineage gap. Often the fix is smaller than you think.

Take 2-min assessment