Design a security audit-logging and detection pipeline for a SaaS application.
Key Talking Points
- ✓Define what to log: authentication events (login, logout, MFA challenge, failure), authorization decisions (denials), sensitive data reads/writes, admin/privileged actions, configuration changes, and security-control changes (e.g. MFA disabled, password changed).
- ✓Log schema: every event needs timestamp (UTC), event_type, actor (user_id + session_id + IP), resource (what was acted on), outcome (success/fail), and a unique event_id. Never log secrets, PII beyond minimum needed, or full request bodies.
- ✓Structured logging (JSON) at the source so logs are machine-queryable without parsing. Emit to a log aggregator (Kafka, Kinesis, or a SIEM ingest pipeline).
- ✓Tamper evidence: ship logs to an append-only store outside the application's control (separate account/region, write-only IAM role). If an attacker compromises the app, they shouldn't be able to erase evidence. Consider immutable S3 Object Lock or Splunk forwarding.
- ✓Real-time detection: stream events through detection rules (SIEM, or custom Lambda/Flink) for high-signal patterns: many auth failures, impossible travel (login from two distant locations in short time), privilege escalation, bulk data export.
- ✓Retention and tiering: hot store (SIEM, 90 days) for active investigation; cold store (S3 Glacier) for compliance (typically 1–7 years depending on regulation).
- ✓Alerting: PagerDuty/OpsGenie integration for critical alerts (active account takeover, admin privilege misuse). Avoid alert fatigue by tuning signal-to-noise; create runbooks for each alert type.
- ✓Access control on audit logs: only the security team (and read-only audit role) should read logs. Developers should not be able to query other users' audit records.
An audit-logging pipeline must be tamper-evident, comprehensive, queryable for forensics, and feed real-time detection so threats are caught quickly. The pipeline design has to balance volume/cost with completeness.