Migrating from Monolith to Microservices Without Breaking Everything

Migrating a monolith to microservices is one of those things that sounds straightforward in a conference talk and becomes extremely messy in practice. We have done this four times now for different clients, and every project taught us something the previous one did not. Here is the playbook we have settled on.

Do Not Rewrite From Scratch

This is the most important rule, and almost every team ignores it the first time. The temptation is to build a clean microservices architecture from the ground up and then switch over. This approach fails roughly 80% of the time. You end up maintaining two systems for months, the new system has different bugs than the old one, and the business loses patience before the migration finishes.

Instead, use the strangler fig pattern. Extract services one at a time from the monolith. Each extraction is a small, reversible change. The monolith shrinks gradually until there is nothing left. It is slower but dramatically safer.

What to Extract First

Start with a service that has three properties: high change frequency (the team deploys changes to this code weekly or more), clear domain boundary (you can define its inputs and outputs cleanly), and low coupling to the rest of the system (it does not share database tables with ten other modules).

In practice, the first extraction candidates are usually: notification services, authentication services, file processing pipelines, or reporting modules. These tend to have well-defined interfaces and limited dependencies.

Do not start with the core business logic. That is always the most tangled part of the monolith, and extracting it first guarantees a painful experience.

The Database Is the Hard Part

Splitting code into services is relatively easy. Splitting the database is where things get complicated. If two services need to query the same table, you have not actually decoupled them. You just have a distributed monolith with network calls instead of function calls, which is strictly worse.

Our approach: before extracting a service, first refactor the monolith so that the relevant data access goes through a single internal module. Test that. Deploy it. Then extract that module into a service with its own database. The data migration happens as a separate step, usually with a period of dual-writes to avoid downtime.

API Gateway From Day One

Put an API gateway in front of the monolith before you extract the first service. Route all traffic through it. When you extract a service, you change the routing in the gateway. The client never knows the difference. This also gives you a single place for authentication, rate limiting, and request logging across both the monolith and new services.

Observability Before You Need It

Distributed tracing, centralized logging, and health checks need to be in place before the second service goes live. Debugging a request that crosses three services without distributed tracing is a nightmare. We use OpenTelemetry for tracing, ship logs to a central store (usually Datadog or Grafana Cloud), and set up synthetic monitoring for every service endpoint.

Common Mistakes We Have Seen

Too many services too fast. One client went from 1 monolith to 22 microservices in 4 months. The team could not maintain them. Start with 3-5 services and grow only when there is a clear need.

Shared libraries that couple everything. If every service depends on a shared "common" package that changes weekly, you have recreated the monolith as a library. Keep shared code minimal and stable.

No service ownership. Every service needs a team that owns it. If nobody is responsible for a service, nobody monitors it, and it rots quietly until it breaks at 2am.

Synchronous calls everywhere. If Service A calls Service B which calls Service C synchronously, the whole chain is as slow and fragile as the weakest link. Use async messaging (Kafka, SQS) for anything that does not need an immediate response.

Timeline Expectations

For a mid-size monolith (200K-500K lines of code), plan for 6-12 months to extract the first 3-5 services. The first service takes the longest because you are also building the infrastructure: API gateway, CI/CD per service, observability stack, and the team's muscle memory for working with distributed systems. Services two through five go faster because the patterns are established.

The full migration typically takes 12-24 months. That sounds long, but remember: you are shipping improvements along the way. Each extracted service can be deployed independently, scaled independently, and developed by a team that moves faster because they own a smaller codebase.