Deploying Applications to AWS ECS with Terraform: Infrastructure as Code Guide
Introduction
Deploying containerized applications to AWS can be complex, involving multiple interconnected services: ECS task definitions, load balancers, target groups, listener rules, and DNS configuration. While you could manage these with CLI commands, Infrastructure as Code (IaC) with Terraform provides versioning, reproducibility, and team collaboration.
In this guide, I'll show you how to deploy any containerized application to AWS ECS with Application Load Balancer and custom domain configuration—all defined in Terraform.
What We'll Build
By the end of this tutorial, you'll have Terraform code that creates:
✅ ECS Task Definition (containerized app blueprint)
✅ ECS Service (manages running containers)
✅ ALB Target Group (health checks & routing)
✅ ALB Listener Rule (custom domain routing)
✅ Route53 DNS Record (points domain to ALB)
✅ CloudWatch Log Group (centralized logging)
Best part: All infrastructure is versioned, reviewable, and reproducible!
Architecture Overview
User Request (https://myapp.yourdomain.com)
↓
Route53 DNS (Terraform managed)
↓
Application Load Balancer
↓
ALB Listener Rule (Terraform managed)
↓
Target Group (Terraform managed)
↓
ECS Service (Terraform managed)
↓
ECS Tasks (Docker Containers)
Prerequisites
Terraform installed (v1.0+)
AWS CLI configured with credentials
Docker image in ECR or Docker Hub
Existing AWS infrastructure:
VPC with subnets
ECS cluster
Application Load Balancer
Route53 hosted zone
Project Structure
terraform/
├── main.tf # Main resource definitions
├── variables.tf # Input variables
├── outputs.tf # Output values
├── terraform.tfvars # Variable values (gitignored)
├── versions.tf # Provider versions
└── data.tf # Data sources (existing resources)
Step 1: Set Up Terraform Configuration
versions.tf - Provider Configuration
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Optional: Remote state backend
backend "s3" {
bucket = "my-terraform-state"
key = "ecs/my-app/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
}
}
}
Why remote backend?
✅ Team collaboration (shared state)
✅ State locking (prevents conflicts)
✅ Encryption at rest
✅ Version history
variables.tf - Input Variables
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name (e.g., prod, staging)"
type = string
}
variable "project_name" {
description = "Project name for resource naming"
type = string
}
variable "app_name" {
description = "Application name"
type = string
}
variable "app_image" {
description = "Docker image URL"
type = string
}
variable "app_port" {
description = "Port the application listens on"
type = number
default = 8080
}
variable "app_cpu" {
description = "CPU units for the task (1024 = 1 vCPU)"
type = number
default = 256
}
variable "app_memory" {
description = "Memory for the task (MiB)"
type = number
default = 512
}
variable "desired_count" {
description = "Number of tasks to run"
type = number
default = 2
}
variable "health_check_path" {
description = "Health check endpoint"
type = string
default = "/"
}
variable "health_check_matcher" {
description = "Expected HTTP status codes"
type = string
default = "200"
}
variable "domain_name" {
description = "Custom domain for the app (e.g., myapp.example.com)"
type = string
}
variable "hosted_zone_name" {
description = "Route53 hosted zone (e.g., example.com)"
type = string
}
variable "vpc_id" {
description = "VPC ID where resources will be created"
type = string
}
variable "ecs_cluster_name" {
description = "Name of existing ECS cluster"
type = string
}
variable "alb_arn" {
description = "ARN of existing Application Load Balancer"
type = string
}
variable "alb_listener_arn" {
description = "ARN of ALB HTTPS listener (port 443)"
type = string
}
variable "environment_variables" {
description = "Environment variables for the container"
type = map(string)
default = {}
}
variable "secrets" {
description = "Secrets from AWS Secrets Manager"
type = list(object({
name = string
valueFrom = string
}))
default = []
}
terraform.tfvars - Variable Values
aws_region = "us-east-1"
environment = "production"
project_name = "my-company"
app_name = "my-app"
# Docker image
app_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest"
# Container configuration
app_port = 8080
app_cpu = 512
app_memory = 1024
desired_count = 2
# Health check
health_check_path = "/health"
health_check_matcher = "200,302" # Accept 200 OK and 302 redirects
# DNS
domain_name = "myapp.example.com"
hosted_zone_name = "example.com"
# Existing infrastructure
vpc_id = "vpc-0123456789abcdef0"
ecs_cluster_name = "production-cluster"
alb_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/production-alb/abc123"
alb_listener_arn = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/production-alb/abc123/def456"
# Environment variables
environment_variables = {
ENV = "production"
LOG_LEVEL = "info"
PORT = "8080"
}
# Secrets (stored in AWS Secrets Manager)
secrets = [
{
name = "DATABASE_URL"
valueFrom = "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/database-url-abc123"
},
{
name = "API_KEY"
valueFrom = "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/api-key-def456"
}
]
🔒 Important: Add terraform.tfvars to .gitignore - it contains sensitive values!
Step 2: Data Sources for Existing Resources
data.tf - Query Existing Infrastructure
# Get existing ECS cluster
data "aws_ecs_cluster" "main" {
cluster_name = var.ecs_cluster_name
}
# Get existing ALB
data "aws_lb" "main" {
arn = var.alb_arn
}
# Get Route53 hosted zone
data "aws_route53_zone" "main" {
name = var.hosted_zone_name
private_zone = false
}
# Get VPC
data "aws_vpc" "main" {
id = var.vpc_id
}
# Get current AWS account
data "aws_caller_identity" "current" {}
# Get current AWS region
data "aws_region" "current" {}
Why use data sources?
✅ Reference existing infrastructure without hardcoding ARNs
✅ Validate resources exist before creating new ones
✅ Get dynamic values (like ALB DNS name)
Step 3: Main Infrastructure Resources
main.tf - Core Resources
# ============================================================
# CloudWatch Log Group for Container Logs
# ============================================================
resource "aws_cloudwatch_log_group" "app" {
name = "/ecs/${var.environment}/${var.app_name}"
retention_in_days = 30
tags = {
Name = "${var.environment}-${var.app_name}-logs"
}
}
# ============================================================
# IAM Role for ECS Task Execution
# ============================================================
resource "aws_iam_role" "ecs_task_execution" {
name = "${var.environment}-${var.app_name}-ecs-task-execution"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
role = aws_iam_role.ecs_task_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Additional policy for Secrets Manager access
resource "aws_iam_role_policy" "secrets_access" {
count = length(var.secrets) > 0 ? 1 : 0
name = "${var.environment}-${var.app_name}-secrets-access"
role = aws_iam_role.ecs_task_execution.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue",
"kms:Decrypt"
]
Resource = [for secret in var.secrets : secret.valueFrom]
}
]
})
}
# ============================================================
# IAM Role for ECS Task (Application Runtime)
# ============================================================
resource "aws_iam_role" "ecs_task" {
name = "${var.environment}-${var.app_name}-ecs-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
# Add custom policies for your app (e.g., S3 access, DynamoDB, etc.)
# resource "aws_iam_role_policy" "app_permissions" { ... }
# ============================================================
# ECS Task Definition
# ============================================================
resource "aws_ecs_task_definition" "app" {
family = "${var.environment}-${var.app_name}"
network_mode = "bridge" # Use "awsvpc" for Fargate
requires_compatibilities = ["EC2"] # Use ["FARGATE"] for Fargate
cpu = var.app_cpu
memory = var.app_memory
execution_role_arn = aws_iam_role.ecs_task_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([
{
name = var.app_name
image = var.app_image
cpu = var.app_cpu
memory = var.app_memory
essential = true
portMappings = [
{
containerPort = var.app_port
hostPort = 0 # Dynamic port mapping (use var.app_port for Fargate)
protocol = "tcp"
}
]
environment = [
for key, value in var.environment_variables : {
name = key
value = value
}
]
secrets = var.secrets
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.app.name
"awslogs-region" = data.aws_region.current.name
"awslogs-stream-prefix" = var.app_name
}
}
healthCheck = {
command = [
"CMD-SHELL",
"curl -f http://localhost:${var.app_port}${var.health_check_path} || exit 1"
]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
}
])
tags = {
Name = "${var.environment}-${var.app_name}"
}
}
# ============================================================
# ALB Target Group
# ============================================================
resource "aws_lb_target_group" "app" {
name = "${var.environment}-${var.app_name}-tg"
port = var.app_port
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "instance" # Use "ip" for Fargate
deregistration_delay = 30
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
path = var.health_check_path
protocol = "HTTP"
matcher = var.health_check_matcher
}
tags = {
Name = "${var.environment}-${var.app_name}-tg"
}
}
# ============================================================
# ALB Listener Rule
# ============================================================
resource "aws_lb_listener_rule" "app" {
listener_arn = var.alb_listener_arn
priority = 100 # Adjust as needed (lower = higher priority)
action {
type = "forward"
target_group_arn = aws_lb_target_group.app.arn
}
condition {
host_header {
values = [var.domain_name]
}
}
tags = {
Name = "${var.environment}-${var.app_name}-rule"
}
}
# ============================================================
# ECS Service
# ============================================================
resource "aws_ecs_service" "app" {
name = "${var.environment}-${var.app_name}"
cluster = data.aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.desired_count
# Launch type (EC2 or FARGATE)
launch_type = "EC2" # Change to "FARGATE" if using Fargate
# Deployment configuration
deployment_maximum_percent = 200
deployment_minimum_healthy_percent = 50
# Load balancer configuration
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = var.app_name
container_port = var.app_port
}
# Placement constraints (EC2 only)
placement_constraints {
type = "distinctInstance"
}
# Depends on listener rule to avoid race condition
depends_on = [aws_lb_listener_rule.app]
tags = {
Name = "${var.environment}-${var.app_name}"
}
lifecycle {
ignore_changes = [desired_count] # Allow manual scaling without Terraform drift
}
}
# ============================================================
# Route53 DNS Record
# ============================================================
resource "aws_route53_record" "app" {
zone_id = data.aws_route53_zone.main.zone_id
name = var.domain_name
type = "A"
alias {
name = data.aws_lb.main.dns_name
zone_id = data.aws_lb.main.zone_id
evaluate_target_health = false
}
}
Step 4: Outputs
outputs.tf - Export Important Values
output "task_definition_arn" {
description = "ARN of the ECS task definition"
value = aws_ecs_task_definition.app.arn
}
output "service_name" {
description = "Name of the ECS service"
value = aws_ecs_service.app.name
}
output "target_group_arn" {
description = "ARN of the target group"
value = aws_lb_target_group.app.arn
}
output "cloudwatch_log_group" {
description = "CloudWatch log group name"
value = aws_cloudwatch_log_group.app.name
}
output "app_url" {
description = "Application URL"
value = "https://${var.domain_name}"
}
output "dns_name" {
description = "DNS record created"
value = aws_route53_record.app.fqdn
}
Step 5: Deploy Your Infrastructure
Initialize Terraform
cd terraform/
terraform init
What this does:
Downloads AWS provider plugins
Initializes remote backend (if configured)
Prepares working directory
Plan Changes
terraform plan -out=tfplan
What this shows:
Resources to be created (green +)
Resources to be modified (yellow ~)
Resources to be destroyed (red -)
Total changes
Review carefully! This is your preview before applying changes.
Apply Changes
terraform apply tfplan
What happens:
Creates CloudWatch log group
Creates IAM roles and policies
Registers ECS task definition
Creates ALB target group
Creates ALB listener rule
Creates ECS service (launches containers)
Creates Route53 DNS record
Typical completion time: 2-5 minutes
Verify Deployment
# Check outputs
terraform output
# Check service status
aws ecs describe-services \
--cluster production-cluster \
--services $(terraform output -raw service_name)
# Check target health
aws elbv2 describe-target-health \
--target-group-arn $(terraform output -raw target_group_arn)
# View logs
aws logs tail $(terraform output -raw cloudwatch_log_group) --follow
Advanced: Terraform Modules
For reusability across multiple apps, create a module:
Module Structure
modules/
└── ecs-app/
├── main.tf
├── variables.tf
├── outputs.tf
└── README.md
environments/
├── production/
│ ├── main.tf # Uses module
│ ├── variables.tf
│ └── terraform.tfvars
└── staging/
├── main.tf
├── variables.tf
└── terraform.tfvars
modules/ecs-app/main.tf
Move all resources from previous main.tf into this module.
environments/production/main.tf - Use Module
module "my_app" {
source = "../../modules/ecs-app"
aws_region = var.aws_region
environment = "production"
project_name = "my-company"
app_name = "my-app"
app_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.2.3"
app_port = 8080
desired_count = 3
domain_name = "myapp.example.com"
hosted_zone_name = "example.com"
vpc_id = data.aws_vpc.main.id
ecs_cluster_name = "production-cluster"
alb_arn = data.aws_lb.production.arn
alb_listener_arn = data.aws_lb_listener.https.arn
environment_variables = {
ENV = "production"
}
}
module "another_app" {
source = "../../modules/ecs-app"
# Different configuration for another app
app_name = "api-service"
app_port = 3000
# ...
}
Benefits:
✅ Deploy multiple apps with same pattern
✅ Consistent configuration across environments
✅ Easy to maintain and update
✅ Reusable across projects
Deployment Workflow with CI/CD
GitHub Actions Example
.github/workflows/deploy.yml:
name: Deploy to ECS
on:
push:
branches: [main]
env:
AWS_REGION: us-east-1
ECR_REPOSITORY: my-app
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build, tag, and push image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
echo "IMAGE=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_ENV
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
- name: Terraform Init
working-directory: ./terraform
run: terraform init
- name: Terraform Plan
working-directory: ./terraform
run: |
terraform plan \
-var="app_image=${{ env.IMAGE }}" \
-out=tfplan
- name: Terraform Apply
working-directory: ./terraform
run: terraform apply -auto-approve tfplan
- name: Wait for deployment
run: |
aws ecs wait services-stable \
--cluster production-cluster \
--services my-app
Deployment Process
Push to main branch → Triggers workflow
Build Docker image → Tag with Git SHA
Push to ECR → Store image
Terraform plan → Show changes
Terraform apply → Update infrastructure
Wait for stability → Ensure deployment succeeds
Managing Updates
Update Application Code
1. Update app_image in terraform.tfvars
app_image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.2.4"
2. Plan and apply
terraform plan -out=tfplan
terraform apply tfplan
What Terraform does:
Creates new task definition revision (:2)
Updates ECS service to use new task definition
ECS performs rolling deployment (zero downtime)
Scale Application
Update desired_count in terraform.tfvars
desired_count = 5
Apply changes
terraform apply -var="desired_count=5"
Update Environment Variables
Edit terraform.tfvars
environment_variables = {
ENV = "production"
LOG_LEVEL = "debug" # Added
NEW_FEATURE_FLAG = "true" # Added
}
Apply
terraform apply
Important: Changing environment variables creates a new task definition and triggers deployment.
State Management Best Practices
Remote State with S3
terraform {
backend "s3" {
bucket = "my-company-terraform-state"
key = "production/ecs/my-app/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# Enable versioning on S3 bucket
# Enable encryption
# Enable bucket logging
}
}
Create State Backend
Create S3 bucket
aws s3 mb s3://my-company-terraform-state --region us-east-1
Enable versioning
aws s3api put-bucket-versioning \
--bucket my-company-terraform-state \
--versioning-configuration Status=Enabled
Enable encryption
aws s3api put-bucket-encryption \
--bucket my-company-terraform-state \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
Create DynamoDB table for locking
aws dynamodb create-table \
--table-name terraform-state-lock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
State Commands
View current state
terraform state list
Show specific resource
terraform state show aws_ecs_service.app
Import existing resource
terraform import aws_ecs_service.app arn:aws:ecs:...
Remove resource from state (doesn't delete)
terraform state rm aws_ecs_service.app
Move resource to different address
terraform state mv aws_ecs_service.app aws_ecs_service.renamed
Troubleshooting Common Issues
Issue 1: Task Definition Already Exists
Error:
Error: creating ECS Task Definition: ClientException: Family already exists
Solution: Import existing task definition or use a different family name
terraform import aws_ecs_task_definition.app my-app
Issue 2: Target Group In Use
Error:
Error: deleting Target Group: ResourceInUse: Target group is in use
Solution: Remove listener rule first, then target group. Terraform handles this with depends_on.
Issue 3: Service Won't Stabilize
Symptoms: Terraform times out waiting for service to become stable
Check:
Service events
aws ecs describe-services --cluster production-cluster --services my-app
Target health
aws elbv2 describe-target-health --target-group-arn arn:...
Container logs
aws logs tail /ecs/production/my-app --follow
Common causes:
Health check path returning wrong status code
Container port mismatch
Security group blocking traffic
Container crashing on startup
Issue 4: DNS Not Resolving
Check:
Verify record created
aws route53 list-resource-record-sets \
--hosted-zone-id Z1234567890ABC \
--query "ResourceRecordSets[?Name=='myapp.example.com.']"
Test DNS resolution
nslookup myapp.example.com
Cost Optimization
Right-Size Resources
Monitor CloudWatch metrics
Adjust based on actual usage
app_cpu = 256 # Start small
app_memory = 512 # Increase if needed
Use Spot Instances (Non-Production)
In ECS capacity provider
capacity_providers = ["FARGATE_SPOT"]
70% cheaper than Fargate on-demand
Log Retention
resource "aws_cloudwatch_log_group" "app" {
retention_in_days = 7 # vs 30 or 90
}
Cleanup Unused Resources
Remove old task definition revisions
aws ecs list-task-definitions --family-prefix my-app --status INACTIVE
Security Best Practices
Use Secrets Manager
Never put secrets in environment variables!
Use secrets parameter instead:
secrets = [
{
name = "DATABASE_PASSWORD"
valueFrom = aws_secretsmanager_secret.db_password.arn
}
]
Least Privilege IAM
Task role - only what app needs
resource "aws_iam_role_policy" "app" {
policy = jsonencode({
Statement = [
{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = ["arn:aws:s3:::my-bucket/*"]
}
]
})
}
- Enable Container Insights
resource "aws_ecs_cluster" "main" {
setting {
name = "containerInsights"
value = "enabled"
}
}
Network Isolation
Use awsvpc network mode with private subnets
network_mode = "awsvpc"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.app.id]
}
Terraform vs AWS CLI: Comparison
| Aspect | AWS CLI | Terraform |
|--------------------|--------------------------|-----------------------------------------|
| Reproducibility | Manual re-execution | Declarative, version-controlled |
| Team Collaboration | Difficult (manual docs) | Easy (code review, shared state) |
| Rollback | Manual, error-prone | terraform apply previous version |
| Drift Detection | None | terraform plan shows drift |
| Dependencies | Manual ordering | Automatic dependency graph |
| Documentation | Separate | Infrastructure as code IS documentation |
| Learning Curve | Moderate (many commands) | Steeper initially, easier long-term |
| Multi-cloud | AWS only | Works across providers |
Key Takeaways
Infrastructure as Code: Terraform makes infrastructure reproducible, version-controlled, and collaborative
Modules: Reuse common patterns across multiple apps and environments
Remote State: Essential for team collaboration and state locking
Variables: Separate configuration from code for environment-specific values
Secrets Management: Never hardcode secrets - use Secrets Manager
CI/CD Integration: Automate deployments with GitHub Actions or similar
Incremental Adoption: Can import existing resources with terraform import
Next Steps
Set up remote state backend (S3 + DynamoDB)
Create reusable modules for common patterns
Implement CI/CD pipeline with Terraform
Add auto-scaling with Application Auto Scaling
Enable Container Insights for monitoring
Implement blue/green deployments with CodeDeploy
Add WAF rules to ALB for security
Resources
https://registry.terraform.io/providers/hashicorp/aws/latest/docs
https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/
Questions or suggestions? Drop a comment below!
Found this helpful? Share with your team and follow for more DevOps content!