Integrating GitHub Actions with Backstage Catalog: Automated Entity Registration & Sync
Platform engineering teams frequently encounter stale service metadata when relying solely on static repository scans. Integrating GitHub Actions with Backstage catalog resolves this by triggering real-time entity ingestion and validation during CI/CD pipelines. By leveraging the Plugin Ecosystem & Custom Extensions, organizations can automate the generation, validation, and registration of catalog-info.yaml files before they reach production. This guide provides a precise configuration workflow to eliminate manual catalog drift and enforce metadata compliance at scale.
Context: Why Automate Catalog Ingestion via CI/CD?
Manual catalog updates introduce latency and human error. When integrating GitHub Actions with Backstage, the goal is to shift metadata validation left. GitHub Actions intercepts pull requests, validates catalog-info.yaml schemas, and publishes approved entities directly to the Backstage API. This approach aligns with modern Catalog Integration Patterns that prioritize automated, policy-driven service onboarding over periodic polling, ensuring the developer portal reflects the exact state of deployed infrastructure.
Configuration: GitHub Actions Workflow & Backstage Setup
Deploy a dedicated workflow that executes on push to the default branch. The workflow authenticates using a Backstage API token, then executes the Backstage CLI to validate and register entities. Ensure your app-config.yaml enables the GitHub provider with the correct organization filters.
Backstage Configuration (app-config.yaml)
catalog:
providers:
github:
providerId:
organization: 'my-org'
catalogPath: '/catalog-info.yaml'
schedule:
frequency: { minutes: 30 }
timeout: { minutes: 3 }
locations:
- type: file
target: ./catalog-info.yaml
GitHub Actions Workflow (.github/workflows/catalog-sync.yml)
name: Sync Backstage Catalog
on:
push:
branches: [main]
paths:
- '**/catalog-info.yaml'
jobs:
validate-and-register:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: 18
- name: Validate Catalog
run: npx @backstage/cli catalog validate --path .
- name: Register Entity
env:
BACKSTAGE_TOKEN: ${{ secrets.BACKSTAGE_API_TOKEN }}
run: |
curl -s -X POST "https://<BACKSTAGE_BASE_URL>/api/catalog/locations" \
-H "Authorization: Bearer $BACKSTAGE_TOKEN" \
-H "Content-Type: application/json" \
-d '{"type": "url", "target": "https://github.com/org/repo/blob/main/catalog-info.yaml"}'
Validation & Troubleshooting
After merging the workflow, verify entity ingestion immediately and establish rollback procedures.
Rapid Verification Steps
- API Query: Confirm ingestion via direct endpoint request:
curl -s "https://<BACKSTAGE_BASE_URL>/api/catalog/entities/by-name/component/default/<entity-name>" \ -H "Authorization: Bearer $BACKSTAGE_TOKEN" | jq '.metadata.annotations' - Local Replication: Replicate CI validation locally before merging PRs:
npx @backstage/cli catalog validate --path ./path/to/catalog-info.yaml - UI Confirmation: Navigate to the Backstage catalog search and verify the entity status is
active.
Troubleshooting & Rollback
| Symptom | Root Cause | Resolution |
|---|---|---|
401 Unauthorized |
Expired or insufficient BACKSTAGE_API_TOKEN |
Regenerate token with catalog:read and catalog:write scopes. |
422 Unprocessable Entity |
Malformed YAML or missing required annotations | Run npx @backstage/cli catalog validate locally to isolate schema violations. |
404 Not Found |
Misconfigured API gateway or CORS blocking /api/catalog/locations |
Verify reverse proxy routing and enable CORS for the catalog endpoint. |
| Silent Rejection | Missing backstage.io/techdocs-ref or kubernetes-id annotation |
Add mandatory annotations to catalog-info.yaml and re-run workflow. |
Immediate Rollback Command: Remove a faulty location registration to prevent catalog pollution:
# List registered locations to find the ID
curl -s "https://<BACKSTAGE_BASE_URL>/api/catalog/locations" \
-H "Authorization: Bearer $BACKSTAGE_TOKEN" | jq '.items[] | {id, target: .data.target}'
# Delete by ID
curl -X DELETE "https://<BACKSTAGE_BASE_URL>/api/catalog/locations/<LOCATION_ID>" \
-H "Authorization: Bearer $BACKSTAGE_TOKEN"
Edge Cases & Advanced Scenarios
- Monorepo Path Filtering: Prevent duplicate entity creation by restricting workflow triggers using
pathsordorny/paths-filter. Target only directories containing validcatalog-info.yamlfiles. - Rate Limit Management: For large-scale organizations, implement exponential backoff and use GitHub App tokens instead of PATs to maximize rate limits. GitHub App tokens have higher API rate limits and do not expire like PATs.
- Network Policy Fallbacks: If webhook delivery fails due to strict egress rules, fall back to scheduled polling in
app-config.yamlwith a reduced frequency (frequency: { minutes: 60 }). - Strict Schema Enforcement: Deploy custom Backstage processors to reject malformed entities at the API gateway level before they propagate to the catalog UI.
Common Pitfalls
- Missing
backstage.io/techdocs-reforbackstage.io/kubernetes-idannotations causing silent entity rejection. - Using personal access tokens (PATs) instead of GitHub App tokens, leading to rate limit exhaustion and webhook delivery failures.
- Overlapping
catalogPathglob patterns in monorepos resulting in duplicate entity registration errors. - Failing to configure CORS or API gateway routing for the
/api/catalog/locationsendpoint, causing404errors during CI registration.
Frequently Asked Questions
How do I handle rate limits when syncing hundreds of repositories?
Implement exponential backoff in your workflow, and schedule full syncs during off-peak hours. Use GitHub App tokens rather than PATs for higher rate limits (5,000 vs 15,000 requests/hour for App tokens). For bulk registration, register a single Location entity pointing to a glob pattern rather than individual entities per repository.
Can I trigger Backstage catalog updates only when specific files change?
Yes. Use the dorny/paths-filter action in your workflow to detect changes to catalog-info.yaml or related metadata directories before executing the registration step.
Why are my entities showing as ‘stale’ immediately after registration?
This typically occurs when the backstage.io/managed-by-location annotation is missing or mismatched. Ensure your workflow registers the location URL that exactly matches the location the Backstage backend is polling, so that refresh cycles correctly update the entity’s last-seen timestamp.
Related
- Catalog Integration Patterns — the parent guide on ingestion, processors, and synchronization
- Plugin Ecosystem & Custom Extensions — the section on extension governance and validation
- Scaffolder Template Design — generating the
catalog-info.yamlthis workflow registers