Abstract: At PagerDuty, we see how engineers across the industry are resolving incidents. Some teams take days to resolve problems, some take minutes, and most are somewhere in between.
In this talk, we will cover some learnings that we have had internally at PagerDuty from our own incidents, steps we have taken to get better at them, and how we run post-mortems. From there, we will also cover some trends across the industry: how long outages take, how many people have to get involved, how many teams actually fix their root causes, and how much sleep the average on-call engineer gets.
Learning Outcomes: - Best practices on incident response