DE
Deployment Strategies
Devops advanced v1.0.0
Deployment Strategies
Overview
Deployment strategies determine how new application versions are released to production with minimal risk and zero downtime. The right strategy depends on risk tolerance, rollback speed requirements, and infrastructure capabilities. In the full-lifecycle pipeline, @devops-engineer selects and configures the deployment strategy during Phase 11 based on NFRs from the Requirement Manifest.
Key Concepts
Strategy Comparison
| Strategy | Downtime | Rollback Speed | Risk | Resource Cost | Complexity |
|---|---|---|---|---|---|
| Rolling | Zero | Medium (gradual) | Low-Medium | 1x + buffer | Low |
| Blue-Green | Zero | Instant (switch) | Low | 2x | Medium |
| Canary | Zero | Fast (route shift) | Very Low | 1x + small % | High |
| Recreate | Yes | Slow (redeploy) | High | 1x | Very Low |
| Feature Flags | Zero | Instant (toggle) | Very Low | 1x | Medium |
Strategy Decision Matrix
┌─────────────────────────────────────────────────────────┐
│ Deployment Strategy Selection │
├─────────────────────────────────────────────────────────┤
│ │
│ Q: Can you afford 2x infrastructure? │
│ ├─ Yes → Blue-Green (safest, instant rollback) │
│ └─ No │
│ Q: Need gradual rollout with metrics? │
│ ├─ Yes → Canary (progressive, data-driven) │
│ └─ No │
│ Q: Stateless application? │
│ ├─ Yes → Rolling Update (default, simple) │
│ └─ No → Recreate (stateful, accepts downtime) │
│ │
│ Always consider: Feature Flags for business logic │
│ changes independent of deployment │
└─────────────────────────────────────────────────────────┘
Blue-Green Deployment
Load Balancer
┌─────────┐
│ Route │
│ 100% │
└────┬────┘
┌──────────┼──────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Blue (v1) │ │ Green (v2) │
│ ACTIVE │ │ STANDBY │
│ │ │ │
│ 3 instances │ │ 3 instances │
└──────────────┘ └──────────────┘
After validation: Route 100% → Green, Blue becomes standby
Rollback: Route 100% → Blue (instant)
Canary Deployment
Load Balancer
┌─────────────────────────────────┐
│ 95% → v1 (stable) │
│ 5% → v2 (canary) │
└─────────────────────────────────┘
Phase 1: 5% canary → monitor 15 min → check metrics
Phase 2: 25% canary → monitor 15 min → check metrics
Phase 3: 50% canary → monitor 15 min → check metrics
Phase 4: 100% → v2 → remove v1
Auto-rollback if: error rate > 1% OR p99 > 500ms
Best Practices
- Default to rolling updates — Simplest zero-downtime strategy for stateless apps
- Use blue-green for critical services — Instant rollback is worth the cost
- Canary for high-risk changes — Database schema changes, algorithm changes
- Feature flags for business logic — Decouple deployment from release
- Automate rollback triggers — Error rate, latency, health check thresholds
- Smoke test after deploy — Hit critical endpoints in production post-deploy
- Database-first migration — Schema changes must be backward-compatible
- Use deployment slots (PaaS) — Azure App Service slots, AWS Elastic Beanstalk
Code Examples
✅ Good: ECS Rolling Update
# Terraform — ECS rolling deployment
resource "aws_ecs_service" "api" {
name = "api-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.api.arn
desired_count = 3
deployment_minimum_healthy_percent = 66 # Keep 2/3 running
deployment_maximum_percent = 133 # Allow 1 extra during deploy
deployment_circuit_breaker {
enable = true
rollback = true # Auto-rollback on failure
}
ordered_placement_strategy {
type = "spread"
field = "attribute:ecs.availability-zone"
}
}
✅ Good: Kubernetes Canary with Argo Rollouts
# argo-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-service
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 5
- pause: { duration: 15m }
- analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: api-service
- setWeight: 25
- pause: { duration: 15m }
- setWeight: 50
- pause: { duration: 15m }
- setWeight: 100
canaryService: api-service-canary
stableService: api-service-stable
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
metrics:
- name: success-rate
interval: 60s
successCondition: result[0] >= 0.99
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
/
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
✅ Good: Feature Flag Integration
@Service
public class PaymentService {
private final FeatureFlagClient featureFlags;
public PaymentResult processPayment(PaymentRequest request) {
if (featureFlags.isEnabled("new-payment-gateway", request.customerId())) {
return newPaymentGateway.process(request); // Canary via flag
}
return legacyPaymentGateway.process(request); // Stable path
}
}
❌ Bad: Deployment Anti-Patterns
# Manual deployment with downtime
ssh production-server
docker stop myapp
docker pull myapp:latest
docker run myapp:latest
# 30 seconds of downtime, no rollback plan
Anti-Patterns
- Big bang deployments — Replace all instances at once with no rollback
- Manual deploys — SSH + docker commands; unreproducible
- No health checks — New version starts receiving traffic before ready
- Breaking database changes — Column rename without backward compatibility
- No monitoring during rollout — Deploy and walk away
- Coupling deploy to release — Feature toggle avoids this
Testing Strategies
- Deployment dry runs — Test deployment pipeline against staging
- Chaos engineering — Simulate failures during canary phase
- Load testing during rollout — Verify performance under progressive traffic shift
- Rollback drills — Practice rollback procedure monthly