Exporting Audit Logs to a SIEM

Q: How do I guarantee delivery if the SIEM is temporarily down?

Enable filesystem buffering (storage.type filesystem) with a backlog limit. The shipper persists undelivered batches to disk and replays them when the SIEM recovers, giving at-least-once delivery; deduplicate on event id at the index to absorb the resulting retries.

Audit events are only useful for compliance and incident response once they leave the portal and land in a queryable, tamper-evident store. This guide ships structured JSON audit logs from a Backstage portal to a SIEM — Splunk or Elasticsearch — with reliable delivery, defined retention, and the field schema auditors expect.

Centralizing logs is the operational endpoint of Audit Logging & Compliance within your Authentication, RBAC & Security Governance program. The portal emits events; a log shipper batches and forwards them over TLS; the SIEM indexes them and enforces retention. Getting the shipping layer right is what makes SOC 2 and ISO 27001 evidence collection a query instead of a fire drill.

Prerequisites

Reliable shipping depends on a chain of givens: the portal emits parseable JSON, a shipper with a disk buffer forwards it, the SIEM token comes from a vault, and a retention window is already decided. Miss one and the export is either lossy or non-compliant.

The disk buffer on the shipper is what turns a SIEM outage from data loss into a delayed replay.

Structured logging enabled in Backstage >= 1.20.0 so the backend emits JSON (not pretty-printed) logs with a stable event_id, actor, action, and correlation_id.
A log shipper: Fluent Bit >= 3.0 as a Kubernetes DaemonSet, or the Elastic Filebeat >= 8.13 agent. The examples use Fluent Bit.
SIEM ingest endpoint and token: a Splunk HEC token or an Elasticsearch API key, stored in Vault and surfaced as ${SIEM_API_KEY} — never inline.
Network egress allowlisting the SIEM endpoint on 443 with TLS 1.3, matching the egress baseline from the parent section.
Retention policy decided: hot tier for recent queryable logs, cold/archive tier for the compliance window.

Exact Configuration

The configuration is a four-link chain: the portal emits JSON, the shipper tails and filters to audit events, the SIEM enforces retention with a lifecycle policy, and the index is locked append-only so it is tamper-evident.

Append-only plus a write-only credential is exactly what auditors mean by tamper-evident.

1. Emit audit events as structured JSON

Ensure the portal writes one JSON object per event to stdout so the shipper can parse it without regex. The envelope mirrors the schema from the audit-logging baseline.

# app-config.production.yaml — Requires Backstage >= 1.20.0
backend:
  logger:
    format: json
    level: info
auditLog:
  enabled: true
  includeFields: [event_id, timestamp, actor, action, resource, outcome, correlation_id]

2. Ship logs with Fluent Bit

The pipeline tails the container logs, parses JSON, keeps only audit events, and forwards them. Batching and an on-disk buffer guarantee delivery across SIEM restarts.

# fluent-bit.conf — Requires Fluent Bit >= 3.0
[SERVICE]
    Flush         5
    Log_Level     info
    storage.path  /var/log/flb-storage/
    storage.backlog.mem_limit 64M

[INPUT]
    Name              tail
    Path              /var/log/containers/portal-backend-*.log
    Parser            docker
    Tag               portal.audit
    storage.type      filesystem

[FILTER]
    Name    grep
    Match   portal.audit
    Regex   log "action":

[OUTPUT]
    Name              splunk
    Match             portal.audit
    Host              ${SIEM_HOST}
    Port              443
    TLS               On
    TLS.Verify        On
    Splunk_Token      ${SIEM_API_KEY}
    Splunk_Send_Raw   On
    Retry_Limit       5

For an Elasticsearch destination, swap the output block:

# Requires Fluent Bit >= 3.0, Elasticsearch >= 8.13
[OUTPUT]
    Name              es
    Match             portal.audit
    Host              ${SIEM_HOST}
    Port              443
    TLS               On
    HTTP_Auth_Header  ApiKey ${SIEM_API_KEY}
    Index             portal-audit
    Suppress_Type_Name On
    Retry_Limit       5

3. Enforce retention at the SIEM

Shipping is half the contract; the SIEM must keep logs for the mandated window and then expire them. The example uses an Elasticsearch ILM policy: hot for 30 days, then cold, then delete at the compliance boundary.

// PUT _ilm/policy/portal-audit — Requires Elasticsearch >= 8.13
{
  "policy": {
    "phases": {
      "hot":    { "actions": { "rollover": { "max_age": "30d", "max_primary_shard_size": "50gb" } } },
      "cold":   { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "395d", "actions": { "delete": {} } }
    }
  }
}

4. Lock down the index against tampering

Restrict write access to the shipper’s API key only, and disable updates so events are append-only. This append-only posture is what auditors mean by tamper-evident. Scope the shipper’s credential narrowly using the same ownership discipline as your Role-Based Access Control Setup.

# Requires Elasticsearch >= 8.13 — create a write-only role for the shipper
curl -s -X POST "https://${SIEM_HOST}/_security/role/portal-audit-writer" \
  -H "Authorization: ApiKey ${SIEM_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"indices":[{"names":["portal-audit*"],"privileges":["create_index","create_doc"]}]}'

Validation

Confirm the whole chain moves: the shipper is processing records, the SIEM has a recent event, the lifecycle policy is attached, and the disk buffer is draining rather than backing up.

A growing buffer with a healthy shipper means the SIEM side is the problem — check egress and TLS.

# 1. Fluent Bit is parsing and matching audit events (not dropping them)
kubectl exec ds/fluent-bit -n logging -- curl -s http://127.0.0.1:2020/api/v1/metrics \
  | jq '.output."splunk.0".proc_records'
# Expect: a non-zero, increasing count

# 2. The SIEM received a recent event (Elasticsearch)
curl -s "https://${SIEM_HOST}/portal-audit*/_search?q=action:rbac.policy_update&size=1" \
  -H "Authorization: ApiKey ${SIEM_API_KEY}" | jq '.hits.total.value'
# Expect: >= 1

# 3. ILM policy is attached and tracking the index
curl -s "https://${SIEM_HOST}/portal-audit*/_ilm/explain" \
  -H "Authorization: ApiKey ${SIEM_API_KEY}" | jq '.indices[].policy'
# Expect: "portal-audit"

# 4. Buffer is draining, not backing up
kubectl exec ds/fluent-bit -n logging -- du -sh /var/log/flb-storage/
# Expect: small and stable, not growing unbounded

Edge Cases & Troubleshooting

The export failures fall into three classes — delivery gaps, duplication, and schema drift — and each has a mechanical fix. The diagram maps the symptoms to those classes.

Using event_id as the document id makes at-least-once delivery safe by turning retries into idempotent writes.

Symptom	Root Cause	Resolution
Events missing in SIEM but pods healthy	`grep` filter regex too strict, dropping events	Relax the `Regex` match or confirm logs contain the `action` field
Filesystem buffer growing unbounded	SIEM unreachable; shipper retrying	Check egress/TLS to `${SIEM_HOST}`; raise `storage.backlog.mem_limit` temporarily
Duplicate events after a restart	At-least-once retry re-sent buffered batch	Index on `event_id` as the document `_id` to dedupe on ingest
Logs rejected with `mapping conflict`	Schema drift in a field type	Pin an index template defining `actor` and `outcome` types before ingest
Old logs not expiring	ILM `delete` phase `min_age` misconfigured	Verify `min_age` matches the retention window and rollover is firing

Frequently Asked Questions

Should the portal push to the SIEM directly or go through a shipper?

Use a shipper. A dedicated agent like Fluent Bit adds an on-disk buffer, batching, and retry, so a SIEM outage never blocks the portal or loses events. Pushing directly from application code couples request latency to SIEM availability and drops events on failure.

How do I guarantee delivery if the SIEM is temporarily down?

Enable filesystem buffering (storage.type filesystem) with a backlog limit. The shipper persists undelivered batches to disk and replays them when the SIEM recovers, giving at-least-once delivery; deduplicate on event_id at the index to absorb the resulting retries.

What retention satisfies common compliance frameworks?

SOC 2 typically expects at least 12 months of accessible logs, so an ILM policy that keeps roughly 13 months before deletion gives a safe margin. ISO 27001 and HIPAA may require longer archival; tier older logs to cold or frozen storage to control cost while meeting the window.