EC2 Auto-Recovery¶
Overview¶
All 4 EC2 instances have CloudWatch alarms that trigger automatic recovery when the underlying hardware fails.
How It Works¶
- CloudWatch monitors
StatusCheckFailed_Systemmetric - If 2 consecutive failures occur within 2 minutes, the alarm triggers
- AWS migrates the instance to healthy hardware
- The instance retains its instance ID, private IP, EBS volumes, and Elastic IP
Covered Instances¶
| Instance | Alarm Name | Status |
|---|---|---|
prod-nhc-django |
prod-nhc-django-auto-recover |
✅ Active |
prod-nhc-app |
prod-nhc-app-auto-recover |
✅ Active |
prod-nhc-foursites |
prod-nhc-foursites-auto-recover |
✅ Active |
prod-nhc-gitlab-runner |
prod-nhc-gitlab-runner-auto-recover |
✅ Active |
Limitations¶
- Only recovers from system status check failures (underlying hardware)
- Does not recover from instance status check failures (OS-level issues)
- Instance must use EBS-backed storage (all our instances do)