Langfuse just got faster →
Product EngineeringIncident Response

Langfuse Incident Response Plan

Declaring an incident

Any team member can declare an incident at any time — don't wait for certainty. When in doubt, page.

  1. Log in to incident.io (Slack SSO) → "Declare Incident", or use /incident in the #incidents Slack channel.
  2. Fill in a summary, affected service(s), and always select the highest severity — this is not published externally.

incident.io will automatically create a Slack channel, page the on-call engineer via PagerDuty, and post to #incidents. On-call schedules are managed in PagerDuty — the on-call engineer's phone will ring within 1–2 minutes.

When to page: platform outages, security issues, elevated errors, a customer seeing another customer's data.

Response

The first engineer to join the incident channel is the Incident Lead. Assign yourself the role in incident.io. Pull in DRIs of affected components or Max if needed. For customer-facing incidents, pull someone from the business side to monitor Slack channels and support tickets.

  1. Triage — Collect evidence (screenshots, metrics, logs), publish a status page update (see below). For critical incidents, enable the product announcement banner.
  2. Mitigate — Restore the system first: rollback, scale up, feature-flag, hotfix. Root cause comes later.
  3. Stabilize — Mark as mitigated in incident.io, update the status page, monitor for 15–30 min, then dissolve the call.

War room call

Keep all incident communication in the incident.io war room call so remote teammates can join quickly and we have a transcript for the post-mortem.

  1. Open the incident Slack channel created by incident.io.
  2. Click ☎️ Join the call in the incident.io message.

Status page

Status pages are extremely important — they are our mechanism to show transparency to users, which builds trust. When in doubt, always set up a status page.

The following should always have a status page:

  • Eval execution delays
  • Ingestion delays
  • Errors/latencies on public APIs
  • Login issues

To publish: go to "Status Pages" in incident.io, select our public status page, and hit "Publish Incident". Declaring an incident via /incident in Slack does not automatically create a public status page update — you must publish it separately.

The incident lead keeps the status page up to date with concise and accurate information throughout the incident.

Post-mortem

After mitigation, find and fix the root cause. Complete the post-mortem in Linear using the auto-generated timeline, covering: summary, impact, root cause, contributing factors, and action items with owners. Track follow-ups in the Linear ticket. Share in #team-engineering.


Was this page helpful?