Design a secrets rotation system for a microservices fleet to eliminate long-lived static credentials.
Key Talking Points
- ✓Central secrets manager (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager) is the single source of truth. Services fetch credentials at startup and on rotation events — never bake them into images or env files.
- ✓Dynamic secrets (preferred for DB credentials): the secrets manager issues short-lived, per-service credentials on demand (e.g. Vault Database Engine creates a temporary DB user valid for 1 hour). When it expires, the service fetches a fresh one. No manual rotation needed; a leaked cred is already expired.
- ✓For static secrets that can't be dynamic (third-party API keys): automate rotation via the vendor's API (where available) and notify all services via a pub/sub notification (SNS, Vault lease renewal). Services must handle hot-reload of credentials without restart.
- ✓Graceful rotation (two-phase): (1) generate new credential and distribute it; (2) after a grace period (verify new cred works), revoke the old one. This avoids a window where no valid credential exists.
- ✓Service identity for fetching secrets: use platform identity (Kubernetes ServiceAccount + Vault Kubernetes auth, or EC2 IAM instance role) so services prove who they are without a bootstrap secret.
- ✓TLS cert rotation: use cert-manager (Kubernetes) or AWS ACM with auto-renew. Short-lived certs (90 days, as with Let's Encrypt) force frequent rotation by design.
- ✓Audit every secret access and rotation event. Alert on unexpected access patterns (a service reading another service's secret).
- ✓Break-glass: maintain an emergency procedure for manual access with mandatory review and short TTL.
Secrets rotation removes the risk of long-lived credentials being silently compromised and used indefinitely. The design challenge is rotating without downtime and ensuring services always have a valid credential.