Data Platform Modernization: 6-Hour Reports Now Run in 4 Minutes
A mid-size insurance company was running on 10-year-old on-prem SQL Server with manual ETL jobs that broke every week. We migrated them to a lakehouse on AWS, and reports that took 6 hours now finish in 4 minutes.
The Challenge
What was getting in the way
- 01
The core data warehouse ran on SQL Server 2014 hosted on-prem. Queries for monthly reports took up to 6 hours
- 02
ETL jobs were built in SSIS and maintained by one person. When that person was on leave, broken jobs just stayed broken
- 03
No one trusted the data. Finance, underwriting, and claims teams each kept their own Excel spreadsheets as the 'real' numbers
The Solution
How we solved it
We migrated the data platform to a lakehouse architecture on AWS. Raw data lands in S3 via CDC connectors from the SQL Server and three other source systems. Spark jobs on EMR Serverless handle transforms, dbt manages the modeling layer, and everything lands in a Redshift Serverless warehouse for analysts. We replaced the SSIS jobs with Airflow DAGs with built-in alerting. The whole pipeline runs on a schedule with data quality checks at every stage. The team went from 'I don't trust these numbers' to self-service dashboards within 10 weeks.
Technologies
What We Built
A look inside the project
Illustration based on actual project deliverable
The Process
Step-by-step delivery
Assessment
Map all data sources, dependencies, and downstream consumers
CDC Setup
Connect change data capture from SQL Server and 3 other sources
Lakehouse Build
S3 raw layer + Spark transforms + dbt models + Redshift
Pipeline Orchestration
Replace SSIS with Airflow DAGs and data quality checks
Self-Service Analytics
Deploy dashboards and train the team on self-service queries
The Results
The numbers
Report Generation (was 6 hours)
Lower Infrastructure Costs
Full Migration Timeline