Article

Feb 4, 2025

How We Achieved Zero-Downtime Migrations

Migrating one database is hard. Migrating 10,000 tenant databases is a nightmare. Here is a look at the locking mechanisms and "blue-green" deployment strategies TenantsDB uses to update schemas instantly.

The Multi-Tenant Challenge

Building a Software as a Service platform presents a unique infrastructure challenge. You are not just managing one large database. In a true multi-tenant architecture, you are often managing thousands of isolated databases, one for each customer.

This isolation is excellent for security and performance, but it turns schema migrations into a logistical nightmare. If you need to add a simple phone_number column to your User table, you cannot just run a single SQL command. You have to run that command against 10,000 separate databases.

If you attempt to do this synchronously, you risk what is known as the "thundering herd" problem. The massive spike in CPU and I/O required to update every tenant simultaneously can overwhelm your database cluster, causing a platform-wide outage.

The Queue and Lock Strategy

To solve this, we designed TenantsDB around an asynchronous migration engine. We treat schema changes not as immediate commands, but as jobs in a queue.

When you push a new OmniQL schema version, our engine does not apply it immediately. First, it calculates a deterministic "diff" between your new code and the current state of the database. It identifies exactly which tables need to be altered and generates the necessary non-blocking SQL commands.

We explicitly avoid dangerous operations that lock tables for writing. For example, when adding a column with a default value in PostgreSQL, we ensure the operation uses constant-time execution paths to prevent long-held locks that would freeze the tenant's application.

Batch Execution

Once the safe migration plan is generated, it is fed into our Migration Worker. This worker processes tenants in controlled batches, typically updating 50 databases at a time. This allows us to monitor the health of the cluster in real time.

If the database CPU spikes above a safety threshold, the worker automatically pauses the rollout to let the system recover. This ensures that your application remains responsive for active users, even while thousands of underlying database schemas are being actively modified in the background.

The Result

This architecture allows us to offer what we call "Zero-Downtime Migrations." You can deploy a new version of your application with a completely different data model, and TenantsDB handles the complexity of propagating those changes across your entire customer base. The result is a system where you can iterate on your product as fast as you write code, without the fear of breaking production for your most important clients.

Open-source universal query engine.

© 2026 Binary Leap OU

Open-source universal query engine.

© 2026 Binary Leap OU