Measuring Developer Experience with DORA Metrics

DORA metrics turn a vague sense of “delivery velocity” into four numbers you can track, alert on, and tie back to platform investment. This how-to walks through computing deployment frequency, lead time for changes, change failure rate, and mean time to restore from pipeline data, and exposing them per catalog entity. It is the delivery half of the broader Developer Experience Metrics capability within Developer Experience & Self-Service Platforms, and pairs naturally with Instrumenting Portal Usage Analytics for the adoption side of the story.

Prerequisites

  • Backstage >= 1.20.0 with a catalog that already models your deployable components.
  • Prometheus >= 2.45 (or a compatible TSDB) reachable from your exporter.
  • CI/CD API access: a token scoped to read workflow runs and deployments from GitHub Actions or GitLab CI.
  • Node.js >= 20.x for the exporter, with prom-client@^15.0.0.
  • A convention that every deployment is tagged with the catalog entityRef of the component it ships.

Exact Configuration

1. Define each metric precisely

Metric Definition Source data
Deployment frequency Successful production deploys per component per day Deploy events
Lead time for changes Median time from commit to production deploy Commit + deploy timestamps
Change failure rate Share of deploys causing a rollback or incident Deploy + incident events
Mean time to restore Median time from incident open to resolved Incident events

2. Emit deploy counters from the exporter

// exporter/metrics.ts
// Requires prom-client >= 15.0.0
import { Counter, Histogram, register } from 'prom-client';

export const deploysTotal = new Counter({
  name: 'dora_deploys_total',
  help: 'Successful production deployments',
  labelNames: ['entity_ref', 'team', 'outcome'] as const,
});

export const leadTime = new Histogram({
  name: 'dora_lead_time_seconds',
  help: 'Commit-to-deploy lead time',
  labelNames: ['entity_ref', 'team'] as const,
  // Buckets: 1h, 4h, 1d, 3d, 1w
  buckets: [3600, 14400, 86400, 259200, 604800],
});

export const metricsHandler = async () => register.metrics();

3. Translate a deploy webhook into metrics

// exporter/handler.ts
// Requires Node.js >= 20.x
import { deploysTotal, leadTime } from './metrics';

export async function onDeploy(event: {
  entityRef: string;
  team: string;
  outcome: 'success' | 'rollback';
  firstCommitTs: number; // epoch seconds of the earliest commit in the change
  deployedTs: number;
}) {
  const labels = { entity_ref: event.entityRef, team: event.team };
  deploysTotal.inc({ ...labels, outcome: event.outcome });
  if (event.outcome === 'success') {
    leadTime.observe(labels, event.deployedTs - event.firstCommitTs);
  }
}

4. Scrape the exporter

# prometheus.yml
# Requires Prometheus >= 2.45
scrape_configs:
  - job_name: deploy-exporter
    metrics_path: /metrics
    static_configs:
      - targets: ['${DEPLOY_EXPORTER_HOST}:9102']

5. Compute the four metrics with PromQL

# Deployment frequency: successful deploys/day per component (last 7d)
sum by (entity_ref) (rate(dora_deploys_total{outcome="success"}[7d])) * 86400

# Lead time: median, per team
histogram_quantile(0.5, sum by (le, team) (rate(dora_lead_time_seconds_bucket[7d])))

# Change failure rate: rollbacks / all deploys
sum(rate(dora_deploys_total{outcome="rollback"}[7d]))
  / sum(rate(dora_deploys_total[7d]))

Surface these on the component’s entity page so each metric sits next to its owner. The entity-page card pattern is covered in Building a Custom Entity Page Card in Backstage.

Validation

# Requires Prometheus >= 2.45
# 1. Exporter exposes the counters
curl -s "${DEPLOY_EXPORTER_HOST}:9102/metrics" | grep -c "dora_deploys_total"
# Expected: >= 1

# 2. Deploy frequency query returns a value per component
curl -s "${PROMETHEUS_URL}/api/v1/query" \
  --data-urlencode 'query=sum by (entity_ref) (rate(dora_deploys_total{outcome="success"}[7d])) * 86400' \
  | jq '.data.result | length'
# Expected: one entry per active component

# 3. Lead-time histogram has observations
curl -s "${PROMETHEUS_URL}/api/v1/query?query=dora_lead_time_seconds_count" \
  | jq '.data.result[0].value[1]'
# Expected: a non-zero count

Edge Cases & Troubleshooting

Symptom Root Cause Resolution
Lead time spikes to weeks Long-lived feature branches inflate commit-to-deploy span Measure from first commit of the merged change, not branch creation; consider trunk-based development
Change failure rate reads 0% Rollbacks not emitting events Instrument the rollback path explicitly; a missing signal is not a healthy signal
Metrics missing for some services Deploys not tagged with entityRef Enforce the tag in the deploy pipeline; drop untagged samples at the exporter
Deployment frequency looks too high Non-prod deploys counted Filter to the production environment label before incrementing the counter
MTTR unavailable No incident system integration Feed incident open/resolve webhooks into the same exporter before reporting MTTR

Frequently Asked Questions

Are DORA metrics meaningful for a team that deploys weekly?

Yes. DORA is about trend and stability, not absolute speed. A weekly-deploying team with a 2% change failure rate and a one-hour MTTR is in good shape; the metrics flag regressions in that team’s own baseline rather than ranking it against others.

How do we compute lead time without a commercial tool?

Join two timestamps you already have: the earliest commit in a merged change and the production deploy event. Both are available from your VCS and CI/CD APIs. The exporter shown above does exactly this with a Prometheus histogram.