Measuring Developer Experience with DORA Metrics

DORA metrics turn a vague sense of “delivery velocity” into four numbers you can track, alert on, and tie back to platform investment. This how-to walks through computing deployment frequency, lead time for changes, change failure rate, and mean time to restore from pipeline data, and exposing them per catalog entity. It is the delivery half of the broader Developer Experience Metrics capability within Developer Experience & Self-Service Platforms, and pairs naturally with Instrumenting Portal Usage Analytics for the adoption side of the story.

Prerequisites

Computing DORA needs four things: a catalog modeling deployable components, a time-series store, CI/CD API access for deploy data, and a convention that every deploy is tagged with the component’s entityRef.

The entityRef tag is the join key — without it, deploy data cannot be attributed to a component or its owner.

Backstage >= 1.20.0 with a catalog that already models your deployable components.
Prometheus >= 2.45 (or a compatible TSDB) reachable from your exporter.
CI/CD API access: a token scoped to read workflow runs and deployments from GitHub Actions or GitLab CI.
Node.js >= 20.x for the exporter, with prom-client@^15.0.0.
A convention that every deployment is tagged with the catalog entityRef of the component it ships.

Exact Configuration

The pipeline is a straight line: a deploy webhook increments exporter counters and a lead-time histogram, Prometheus scrapes them, and PromQL turns the raw series into the four DORA numbers surfaced on each entity page.

A histogram — not a gauge — is what lets you compute a real median lead time with histogram_quantile.

1. Define each metric precisely

Metric	Definition	Source data
Deployment frequency	Successful production deploys per component per day	Deploy events
Lead time for changes	Median time from commit to production deploy	Commit + deploy timestamps
Change failure rate	Share of deploys causing a rollback or incident	Deploy + incident events
Mean time to restore	Median time from incident open to resolved	Incident events

2. Emit deploy counters from the exporter

// exporter/metrics.ts
// Requires prom-client >= 15.0.0
import { Counter, Histogram, register } from 'prom-client';

export const deploysTotal = new Counter({
  name: 'dora_deploys_total',
  help: 'Successful production deployments',
  labelNames: ['entity_ref', 'team', 'outcome'] as const,
});

export const leadTime = new Histogram({
  name: 'dora_lead_time_seconds',
  help: 'Commit-to-deploy lead time',
  labelNames: ['entity_ref', 'team'] as const,
  // Buckets: 1h, 4h, 1d, 3d, 1w
  buckets: [3600, 14400, 86400, 259200, 604800],
});

export const metricsHandler = async () => register.metrics();

3. Translate a deploy webhook into metrics

// exporter/handler.ts
// Requires Node.js >= 20.x
import { deploysTotal, leadTime } from './metrics';

export async function onDeploy(event: {
  entityRef: string;
  team: string;
  outcome: 'success' | 'rollback';
  firstCommitTs: number; // epoch seconds of the earliest commit in the change
  deployedTs: number;
}) {
  const labels = { entity_ref: event.entityRef, team: event.team };
  deploysTotal.inc({ ...labels, outcome: event.outcome });
  if (event.outcome === 'success') {
    leadTime.observe(labels, event.deployedTs - event.firstCommitTs);
  }
}

4. Scrape the exporter

# prometheus.yml
# Requires Prometheus >= 2.45
scrape_configs:
  - job_name: deploy-exporter
    metrics_path: /metrics
    static_configs:
      - targets: ['${DEPLOY_EXPORTER_HOST}:9102']

5. Compute the four metrics with PromQL

# Deployment frequency: successful deploys/day per component (last 7d)
sum by (entity_ref) (rate(dora_deploys_total{outcome="success"}[7d])) * 86400

# Lead time: median, per team
histogram_quantile(0.5, sum by (le, team) (rate(dora_lead_time_seconds_bucket[7d])))

# Change failure rate: rollbacks / all deploys
sum(rate(dora_deploys_total{outcome="rollback"}[7d]))
  / sum(rate(dora_deploys_total[7d]))

Surface these on the component’s entity page so each metric sits next to its owner. The entity-page card pattern is covered in Building a Custom Entity Page Card in Backstage.

Validate the chain in order: the exporter exposes the counters, the deployment-frequency query returns a value per component, and the lead-time histogram has real observations.

A zero histogram count means lead time is not being observed — check that successful deploys call observe.

# Requires Prometheus >= 2.45
# 1. Exporter exposes the counters
curl -s "${DEPLOY_EXPORTER_HOST}:9102/metrics" | grep -c "dora_deploys_total"
# Expected: >= 1

# 2. Deploy frequency query returns a value per component
curl -s "${PROMETHEUS_URL}/api/v1/query" \
  --data-urlencode 'query=sum by (entity_ref) (rate(dora_deploys_total{outcome="success"}[7d])) * 86400' \
  | jq '.data.result | length'
# Expected: one entry per active component

# 3. Lead-time histogram has observations
curl -s "${PROMETHEUS_URL}/api/v1/query?query=dora_lead_time_seconds_count" \
  | jq '.data.result[0].value[1]'
# Expected: a non-zero count

Most DORA data problems are measurement errors, not real regressions: a missing rollback signal reads as perfect stability, an untagged deploy vanishes, a non-prod deploy inflates frequency. The diagram sorts the symptoms by whether the fix is a signal to add or a filter to apply.

A 0% change-failure rate is almost always a missing rollback signal — a missing signal is not a healthy one.

Symptom	Root Cause	Resolution
Lead time spikes to weeks	Long-lived feature branches inflate commit-to-deploy span	Measure from first commit of the merged change, not branch creation; consider trunk-based development
Change failure rate reads 0%	Rollbacks not emitting events	Instrument the rollback path explicitly; a missing signal is not a healthy signal
Metrics missing for some services	Deploys not tagged with `entityRef`	Enforce the tag in the deploy pipeline; drop untagged samples at the exporter
Deployment frequency looks too high	Non-prod deploys counted	Filter to the production environment label before incrementing the counter
MTTR unavailable	No incident system integration	Feed incident open/resolve webhooks into the same exporter before reporting MTTR

Frequently Asked Questions

Are DORA metrics meaningful for a team that deploys weekly?

Yes. DORA is about trend and stability, not absolute speed. A weekly-deploying team with a 2% change failure rate and a one-hour MTTR is in good shape; the metrics flag regressions in that team’s own baseline rather than ranking it against others.

How do we compute lead time without a commercial tool?

Join two timestamps you already have: the earliest commit in a merged change and the production deploy event. Both are available from your VCS and CI/CD APIs. The exporter shown above does exactly this with a Prometheus histogram.