AWS bills can escalate quickly as applications scale. While reserved instances and savings plans offer discounts, they require long-term commitments. For dynamic, scaling application services, EC2 or Fargate Spot Instances are a powerful alternative—offering up to a 90% discount compared to On-Demand pricing.
However, running Spot in production comes with a major catch: AWS can terminate your instance with a brief 2-minute warning if they need the capacity back. In this guide, we'll design a hybrid ECS Capacity Provider strategy that blends On-Demand and Spot instances to maintain uptime while reducing costs.
Cost-Efficiency Goal: Run baseline critical services on On-Demand, and direct all dynamic horizontal scaling capacity to Spot instances.
The Strategy: Capacity Providers
ECS Capacity Providers allow us to define rules for how tasks are placed. We will use the following strategies:
- Base: The minimum number of tasks that must run on a capacity provider. We'll set a base of
2on On-Demand to guarantee that we always have at least two containers running, even if Spot capacity is fully reclaimed by AWS. - Weight: The relative proportion of tasks launched on each provider once the base is satisfied. We will use a
1:3ratio (1 On-Demand for every 3 Spot tasks).
Step 1: Terraform Capacity Provider Configuration
Let's codify this strategy using Terraform. We'll configure an ECS cluster that utilizes both FARGATE and FARGATE_SPOT capacity providers.
# Create the main ECS Cluster resource "aws_ecs_cluster" "main" { name = "production-cluster" } # Associate Capacity Providers with the Cluster resource "aws_ecs_cluster_capacity_providers" "main" { cluster_name = aws_ecs_cluster.main.name capacity_providers = ["FARGATE", "FARGATE_SPOT"] default_capacity_provider_strategy { capacity_provider = "FARGATE" base = 2 weight = 1 } default_capacity_provider_strategy { capacity_provider = "FARGATE_SPOT" base = 0 weight = 3 } }
Step 2: Assigning Strategy to ECS Service
When launching our application service, we reference our cluster's capacity provider strategy. This ensures that as the service scales out (e.g. from 2 to 10 tasks), the tasks are placed according to our rules.
resource "aws_ecs_service" "api" { name = "production-api" cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.api.arn desired_count = 8 network_configuration { subnets = ["subnet-xxxx", "subnet-yyyy"] security_groups = ["sg-xxxx"] } # Define capacity strategy overrides capacity_provider_strategy { capacity_provider = "FARGATE" base = 2 weight = 1 } capacity_provider_strategy { capacity_provider = "FARGATE_SPOT" base = 0 weight = 3 } }
Handling Interruption: Graceful Shutdowns
Because Spot instances can be reclaimed at any time, your applications must be stateless and handle shutdowns gracefully. When AWS schedules a Spot instance for termination, ECS receives an interruption warning.
To ensure active connections aren't dropped, configure these settings in your containers:
- Increase deregistration delay: Set your target group's
deregistration_delay.timeout_secondsto 30-60 seconds. This stops the load balancer from sending new requests to the terminating container while it finishes processing active connections. - Configure container stop timeout: Set
stopTimeoutin your ECS container definition to 30 seconds. This gives your application process time to handle the kernel'sSIGTERMsignal, complete open requests, close database connections, and exit cleanly before receiving aSIGKILL.
Result & Savings Metrics
By implementing this strategy on our API tier services, we saw the following outcome:
{
"cluster": "production-cluster",
"metrics": {
"pre_migration_cost_monthly": "$4,200.00",
"post_migration_cost_monthly": "$2,730.00",
"savings_percentage": "35%",
"interruption_replacement_avg_seconds": "42s",
"interruption_dropped_requests": "0"
}
}
Conclusion
Combining ECS Capacity Providers with Spot instances allows you to scale cost-effectively. By automating the placement ratio with Terraform and structuring your services to exit gracefully on SIGTERM, you can run workloads at a fraction of the cost without compromising production availability.
Review your scaling limits and apply this strategy to your dev, staging, and stateless production tiers!