What tools does Opsrift include?

Opsrift includes six product areas: (1) Postmortem Generator — 9-section incident postmortems; (2) Shift Handover Generator — structured handovers with auto-imported incidents; (3) Runbook Generator — operational runbooks from incidents or postmortems; (4) Incident Investigation — AI-assisted Q&A over your incident context; (5) Status Page — hosted public status with components and incidents; (6) Incident Forecast — proactive risk signals from your postmortem history. All six tools are available today.

How does Opsrift generate postmortems?

Opsrift uses Anthropic's Claude AI to generate 9 structured sections in parallel: Executive Summary, Timeline, Root Cause Analysis, Impact, Detection & Response, Resolution, Process & Organizational Gaps, Action Items, and Lessons Learned. Import from PagerDuty, OpsGenie, Datadog, Grafana, or Jira (where connected), or enter data manually.

What integrations are available?

Opsrift supports PagerDuty and OpsGenie for incident import, Datadog and Grafana for alert and metric import, Jira for import and action item push, Confluence for one-click publishing, Slack and Microsoft Teams for webhook notifications and summaries, and GitHub for deploy-to-incident correlation. All nine integrations are available now.

What does the Starter plan include?

The Starter plan at $10/month gives you 10 AI generations per month, all core tools (postmortem, handover, runbook, investigation), 1 status page, PagerDuty & OpsGenie import, and exports. Every plan includes a 7-day free trial — a card is required to start, but you can cancel anytime.

How to automate shift handovers?

Connect PagerDuty or Jira, define your shift window (including quick presets like last 8h / 12h / 24h), and Opsrift generates a 7-section handover with summary, active incidents, tasks, escalations, metrics, upcoming changes, and notes — in under a minute instead of a long verbal passdown.

What is Incident Forecast?

Incident Forecast analyzes your saved postmortem history to surface proactive risk signals: recurring systems, deploy-to-incident correlation, unresolved action item debt, severity trend (worsening/improving/stable), and mean time between incidents (MTBI). An AI-generated risk brief summarizes the overall picture so you can act before the next incident hits.

Can I edit the generated reports?

Yes. Every section can be edited inline or regenerated individually with AI. You can click any section heading to enter edit mode, make changes, and the report updates in real time. You can also regenerate a specific section while keeping the rest intact.

Is my incident data secure?

Yes. Opsrift uses Clerk for authentication, Supabase with Row Level Security for data storage, and all API calls are encrypted via HTTPS. Your incident data is only used to generate your reports and is never used for AI model training.

What is an Incident Postmortem?

An incident postmortem (also called a post-incident review, PIR, or incident retrospective) is a structured document created after a service disruption, outage, or significant operational event. It answers four core questions: what happened, why it happened, how it was resolved, and what will prevent it from happening again. Postmortems are standard practice in SRE, DevOps, and NOC/SOC environments.

Why Incident Postmortems Matter

Without postmortems, teams repeat the same failures. Institutional knowledge lives in people's heads and leaves when they do. A resolved incident without a postmortem is a learning opportunity wasted — the team absorbs the pain of the outage but captures none of the lessons.

Postmortems create a written record that new team members can learn from, that leadership can use to prioritize infrastructure investment, and that compliance teams can reference for audits. They're the mechanism through which individual incidents become organizational improvement.

In regulated industries — FinTech, iGaming, healthcare — incident documentation is often a regulatory requirement, not optional. Auditors expect structured, timestamped records of what went wrong and what was done about it.

The cost of not writing postmortems compounds: repeated incidents, longer resolution times, eroded customer trust, and engineering teams that feel like they're firefighting the same issues over and over without making structural progress.

What Goes Into a Postmortem

A thorough incident postmortem follows a consistent structure. Here are the standard sections most mature operations teams include:

Executive Summary

A 2–3 sentence overview of the incident for leadership and stakeholders who won't read the full document. Covers what happened, how long it lasted, and the business impact at a glance.

Incident Overview

Service affected, severity level, duration, who was involved, and how the incident was detected. This section establishes the basic facts before diving into analysis.

Root Cause Analysis

The underlying technical or process failure. Not "the server crashed" but why it crashed, what allowed it to crash, and what monitoring missed it. Good root cause analysis goes at least three levels deep — asking "why?" until you reach a systemic cause rather than a symptom.

Timeline

Chronological sequence: detection, triage, escalation, mitigation, resolution, communication. Timestamps matter. This section should read like a log of events, not a narrative summary.

Impact Assessment

Customer impact (users affected, revenue lost, SLA breach), internal impact (engineering hours, reputation). Quantify where possible — "approximately 12,000 users experienced degraded checkout for 47 minutes" is more useful than "some users had issues."

Resolution & Recovery

What was done to fix it, in what order, what worked, and what didn't. Include failed attempts — they're valuable for future responders dealing with similar incidents.

Preventive Measures

Systemic changes to prevent recurrence: code fixes, monitoring additions, process changes, architecture improvements. These should address the root cause, not just the symptoms.

Action Items

Specific, assigned, time-bound tasks. Each should have an owner, priority, and deadline. Action items without owners don't get done. Action items without deadlines get deferred indefinitely.

Lessons Learned

What the team learned that goes beyond the specific incident. Process gaps, communication failures, tooling needs, or patterns that connect to previous incidents.

Blameless Postmortem Culture

A postmortem that assigns blame to individuals is worse than no postmortem at all. When people expect to be blamed, they stop reporting incidents honestly, they omit details that make them look bad, and the document becomes a political exercise rather than a learning tool.

Blameless doesn't mean accountability-free. It means focusing on systems and processes rather than personal failure. The question is "how did the system allow this to happen?" not "who screwed up?" A deployment that caused an outage points to gaps in CI/CD safeguards, not to the engineer who clicked Deploy.

Organizations that practice blameless postmortems consistently report higher incident reporting rates, more thorough documentation, and faster cultural adoption of reliability practices. The postmortem becomes a tool teams want to use, not one they dread.

Common Postmortem Mistakes

Writing them days or weeks after the incident. Memory fades fast. Critical details about what was tried, what failed, and why decisions were made get lost within 48 hours. Write the postmortem while the context is fresh.

Making them too long or too vague. Nobody reads a 15-page postmortem. Nobody learns from a 2-sentence one. The sweet spot is structured, section-based, and focused — typically 2–4 pages covering all nine sections.

No action items, or action items with no owner. A postmortem without action items is an incident report, not a learning document. An action item without an owner is a wish, not a task.

Skipping root cause and stopping at symptoms. "The database ran out of connections" is a symptom. The root cause might be a connection leak in a specific service, a misconfigured pool size, or a traffic spike that exposed an existing capacity gap.

Not following up on action items. If nobody checks whether the action items from the last postmortem were completed, the postmortem becomes shelf-ware — filed and forgotten until the same incident happens again.

How to Automate Incident Postmortems

Most of the time spent on postmortems is structural: formatting, organizing timelines, writing summaries of what your monitoring tools already captured. The actual analysis — root cause, lessons learned, action items — is where human judgment matters. Everything else is overhead.

AI tools can generate the first draft from incident data, letting humans focus on analysis and action items rather than formatting. Tools like Opsrift can import directly from PagerDuty and Jira, generate a structured 9-section postmortem in under 60 seconds, and push action items back to Jira — closing the loop between incident response and documentation.

For a deeper look at the automated workflow, see How to Automate Incident Postmortems.

Frequently Asked Questions

How long should an incident postmortem take to write?

Manually, 1–3 hours. With AI automation tools like Opsrift, under 60 seconds for the initial draft, plus review time.

Who should write the postmortem?

The incident commander or on-call engineer who led the response, with input from everyone involved. The writer doesn't need to be the most senior person.

When should you write a postmortem?

Within 24–48 hours of incident resolution, while context is fresh. Waiting longer leads to lost details.

What's the difference between a postmortem and a root cause analysis?

A root cause analysis (RCA) is one section of a postmortem. The full postmortem also covers timeline, impact, resolution, action items, and lessons learned.

Should every incident get a postmortem?

For SEV1/SEV2 incidents, always. For lower severity, use your judgment — if there's something to learn, write it up. Some teams use a lighter 'incident summary' format for minor events.