Rolling deployment

A rolling deployment updates a software application gradually, replacing instances running the old version with the new version incrementally rather than all at once. During the deployment, both versions run simultaneously, with traffic shifting progressively until all instances run the new version. This approach minimizes risk and enables zero-downtime updates.

Why it matters

Traditional "big bang" deployments take the entire system offline, update everything, and bring it back up. This creates downtime, and if something goes wrong, all users are affected immediately.

Rolling deployments eliminate downtime by never stopping the old version until the new version is confirmed working. Users experience continuous service throughout the deployment. If problems emerge, only a fraction of users are affected, and the deployment can stop or roll back before damage spreads.

This gradual approach also reduces the blast radius of bugs. If the new version has a critical flaw, it's detected when serving a small percentage of traffic rather than after all users are affected. The combination of zero downtime and reduced risk makes rolling deployments standard practice for production systems.

How rolling deployments work

The basic mechanics involve:

Instance pool. The application runs on multiple instances (containers, servers, pods). A load balancer distributes traffic across them.

Sequential updates. One or more instances are taken out of the pool, updated to the new version, and returned to service.

Health verification. Before proceeding, the updated instances are verified to be healthy and serving traffic correctly.

Gradual progression. The process repeats until all instances run the new version.

Traffic management. The load balancer continues routing to healthy instances throughout, whether old or new version.

Rolling deployment parameters

Several settings control rolling deployment behavior:

Batch size. How many instances update simultaneously. Smaller batches are safer but slower. "1 at a time" maximizes safety; "50% at a time" is faster.

Health check criteria. What must be true for an instance to be considered healthy? HTTP response codes, latency thresholds, custom checks.

Wait period. How long to observe new instances before proceeding. Longer waits catch problems; shorter waits speed deployment.

Failure threshold. How many instances can fail before halting the deployment? Zero tolerance is strictest; some tolerance accommodates flaky tests.

Rolling deployment challenges

Several factors complicate rolling deployments:

Version compatibility. During deployment, old and new versions serve traffic simultaneously. They must be compatible: same API contracts, same data formats, same database schema.

Session affinity. If users are sticky to specific instances, they might experience version inconsistency during deployment. Stateless designs avoid this.

Database migrations. Schema changes must work with both versions. This typically means deploying migrations separately from code.

Long-running processes. Requests or jobs in progress when an instance updates may fail. Graceful draining handles in-flight work.

Stateful applications. Applications with local state (caches, files) require special handling during instance replacement.

Rolling vs. other deployment strategies

Rolling deployments are one of several approaches:

Strategy	Description	Tradeoffs
Rolling	Gradual instance replacement	Zero downtime, moderate complexity
Blue-Green	Two full environments, instant switch	Simple rollback, double resources
Canary	Small subset first, then full	Maximum control, more orchestration
Big Bang	All at once	Simple, but downtime and risk

Rolling deployments balance simplicity and safety. Blue-green offers cleaner separation but higher cost. Canary provides finer control but more complexity. Choose based on your risk tolerance, infrastructure, and operational capability.

Implementing rolling deployments

Modern platforms provide rolling deployment capabilities:

Kubernetes supports rolling updates natively with configurable parameters for max surge, max unavailable, and health checks.

AWS ECS offers rolling updates with deployment circuit breakers.

Cloud load balancers (ALB, GCP Load Balancing) support gradual traffic shifting.

Container orchestrators generally include rolling update capabilities as standard features.

Configuration typically involves:

Maximum instances unavailable during update

Maximum new instances created beyond normal capacity

Health check endpoints and criteria

Rollback triggers and thresholds

Best practices

Several practices improve rolling deployment success:

Implement proper health checks. Shallow checks (is the port open?) miss problems that deeper checks (can we serve a real request?) catch.

Enable graceful shutdown. Instances should complete in-progress requests before terminating. Abrupt termination causes errors.

Design for backward compatibility. Both versions will run simultaneously. Plan for this explicitly.

Automate completely. Manual rolling deployments are error-prone. Automation ensures consistency.

Monitor actively. Watch error rates, latency, and business metrics during deployment. Pause if problems emerge.

Practice rollbacks. Ensure you can roll back quickly. Test rollback procedures before you need them.

Observability during rolling deployments

Visibility matters during the transition:

Version tagging. Tag metrics and logs by version. See how the new version behaves compared to old.

Error rate monitoring. Watch for spikes as new instances take traffic. Compare error rates between versions.

Performance comparison. Is the new version faster or slower? Detect regressions before completing deployment.

Business metrics. Conversion rates, successful transactions, and other business outcomes shouldn't degrade during deployment.

Tools like Klero help ensure that what you're deploying addresses real user needs. When rolling deployments deliver features customers actually want, the investment in safe deployment practices pays off in user value rather than just risk mitigation.

MODULES

INSIGHTS

Rolling deployment explained: definition, examples & how to use it

Rolling deployment

Why it matters

How rolling deployments work

Rolling deployment parameters

Rolling deployment challenges

Rolling vs. other deployment strategies

Implementing rolling deployments

Best practices

Observability during rolling deployments

Start collecting feedback today

Rolling deployment explained: definition, examples & how to use it

Rolling deployment

Why it matters

How rolling deployments work

Rolling deployment parameters

Rolling deployment challenges

Rolling vs. other deployment strategies

Implementing rolling deployments

Best practices

Observability during rolling deployments

Related terms

Start collecting feedback today