Skip to content

EC2 Auto-Recovery

Overview

All 4 EC2 instances have CloudWatch alarms that trigger automatic recovery when the underlying hardware fails.

How It Works

  1. CloudWatch monitors StatusCheckFailed_System metric
  2. If 2 consecutive failures occur within 2 minutes, the alarm triggers
  3. AWS migrates the instance to healthy hardware
  4. The instance retains its instance ID, private IP, EBS volumes, and Elastic IP

Covered Instances

Instance Alarm Name Status
prod-nhc-django prod-nhc-django-auto-recover ✅ Active
prod-nhc-app prod-nhc-app-auto-recover ✅ Active
prod-nhc-foursites prod-nhc-foursites-auto-recover ✅ Active
prod-nhc-gitlab-runner prod-nhc-gitlab-runner-auto-recover ✅ Active

Limitations

  • Only recovers from system status check failures (underlying hardware)
  • Does not recover from instance status check failures (OS-level issues)
  • Instance must use EBS-backed storage (all our instances do)