Deployment Strategies

Overview

Deployment strategies determine how new application versions are released to production with minimal risk and zero downtime. The right strategy depends on risk tolerance, rollback speed requirements, and infrastructure capabilities. In the full-lifecycle pipeline, @devops-engineer selects and configures the deployment strategy during Phase 11 based on NFRs from the Requirement Manifest.

Key Concepts

Strategy Comparison

Strategy	Downtime	Rollback Speed	Risk	Resource Cost	Complexity
Rolling	Zero	Medium (gradual)	Low-Medium	1x + buffer	Low
Blue-Green	Zero	Instant (switch)	Low	2x	Medium
Canary	Zero	Fast (route shift)	Very Low	1x + small %	High
Recreate	Yes	Slow (redeploy)	High	1x	Very Low
Feature Flags	Zero	Instant (toggle)	Very Low	1x	Medium

Strategy Decision Matrix

┌─────────────────────────────────────────────────────────┐
│              Deployment Strategy Selection               │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Q: Can you afford 2x infrastructure?                    │
│  ├─ Yes → Blue-Green (safest, instant rollback)          │
│  └─ No                                                   │
│      Q: Need gradual rollout with metrics?               │
│      ├─ Yes → Canary (progressive, data-driven)          │
│      └─ No                                               │
│          Q: Stateless application?                       │
│          ├─ Yes → Rolling Update (default, simple)       │
│          └─ No → Recreate (stateful, accepts downtime)   │
│                                                          │
│  Always consider: Feature Flags for business logic       │
│  changes independent of deployment                       │
└─────────────────────────────────────────────────────────┘

Blue-Green Deployment

                    Load Balancer
                    ┌─────────┐
                    │  Route   │
                    │  100%    │
                    └────┬────┘
              ┌──────────┼──────────┐
              ▼                     ▼
    ┌──────────────┐      ┌──────────────┐
    │   Blue (v1)  │      │  Green (v2)  │
    │   ACTIVE     │      │  STANDBY     │
    │              │      │              │
    │  3 instances │      │  3 instances │
    └──────────────┘      └──────────────┘
    
    After validation: Route 100% → Green, Blue becomes standby
    Rollback: Route 100% → Blue (instant)

Canary Deployment

    Load Balancer
    ┌─────────────────────────────────┐
    │  95% → v1 (stable)             │
    │   5% → v2 (canary)             │
    └─────────────────────────────────┘

    Phase 1:  5% canary  → monitor 15 min → check metrics
    Phase 2: 25% canary  → monitor 15 min → check metrics
    Phase 3: 50% canary  → monitor 15 min → check metrics
    Phase 4: 100% → v2   → remove v1
    
    Auto-rollback if: error rate > 1% OR p99 > 500ms

Best Practices

Default to rolling updates — Simplest zero-downtime strategy for stateless apps
Use blue-green for critical services — Instant rollback is worth the cost
Canary for high-risk changes — Database schema changes, algorithm changes
Feature flags for business logic — Decouple deployment from release
Automate rollback triggers — Error rate, latency, health check thresholds
Smoke test after deploy — Hit critical endpoints in production post-deploy
Database-first migration — Schema changes must be backward-compatible
Use deployment slots (PaaS) — Azure App Service slots, AWS Elastic Beanstalk

Code Examples

✅ Good: ECS Rolling Update

# Terraform — ECS rolling deployment
resource "aws_ecs_service" "api" {
  name            = "api-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = 3

  deployment_minimum_healthy_percent = 66    # Keep 2/3 running
  deployment_maximum_percent         = 133   # Allow 1 extra during deploy

  deployment_circuit_breaker {
    enable   = true
    rollback = true    # Auto-rollback on failure
  }

  ordered_placement_strategy {
    type  = "spread"
    field = "attribute:ecs.availability-zone"
  }
}

✅ Good: Kubernetes Canary with Argo Rollouts

# argo-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-service
spec:
  replicas: 5
  strategy:
    canary:
      steps:
        - setWeight: 5
        - pause: { duration: 15m }
        - analysis:
            templates:
              - templateName: success-rate
            args:
              - name: service-name
                value: api-service
        - setWeight: 25
        - pause: { duration: 15m }
        - setWeight: 50
        - pause: { duration: 15m }
        - setWeight: 100
      canaryService: api-service-canary
      stableService: api-service-stable
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
    - name: success-rate
      interval: 60s
      successCondition: result[0] >= 0.99
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

✅ Good: Feature Flag Integration

@Service
public class PaymentService {
    
    private final FeatureFlagClient featureFlags;
    
    public PaymentResult processPayment(PaymentRequest request) {
        if (featureFlags.isEnabled("new-payment-gateway", request.customerId())) {
            return newPaymentGateway.process(request);   // Canary via flag
        }
        return legacyPaymentGateway.process(request);    // Stable path
    }
}

❌ Bad: Deployment Anti-Patterns

# Manual deployment with downtime
ssh production-server
docker stop myapp
docker pull myapp:latest
docker run myapp:latest
# 30 seconds of downtime, no rollback plan

Anti-Patterns

Big bang deployments — Replace all instances at once with no rollback
Manual deploys — SSH + docker commands; unreproducible
No health checks — New version starts receiving traffic before ready
Breaking database changes — Column rename without backward compatibility
No monitoring during rollout — Deploy and walk away
Coupling deploy to release — Feature toggle avoids this

Testing Strategies

Deployment dry runs — Test deployment pipeline against staging
Chaos engineering — Simulate failures during canary phase
Load testing during rollout — Verify performance under progressive traffic shift
Rollback drills — Practice rollback procedure monthly

Deployment Strategies

Deployment Strategies

Overview

Key Concepts

Strategy Comparison

Strategy Decision Matrix

Blue-Green Deployment

Canary Deployment

Best Practices

Code Examples

✅ Good: ECS Rolling Update

✅ Good: Kubernetes Canary with Argo Rollouts

✅ Good: Feature Flag Integration

❌ Bad: Deployment Anti-Patterns

Anti-Patterns

Testing Strategies

References

Related Skills