An application breaks in the middle of the night, someone from your technical team rushes to the rescue. They are up all night performing incantations and miracles to minimize data loss and restore service to your users. They are the heroes of the hour, in some cases literally saving the company.
Or are they the villains?
- Didn’t use application monitoring tools to proactively watch for issues.
- Didn’t set up automated deployment tools, to reduce human error.
- Didn’t have controls in place over their configuration.
- Didn’t do any kind of performance testing.
- Didn’t size systems appropriate to actual workloads.
- Let logs fill up, domains and certificates expire.
- Used the latest fad technologies without consideration for the business.
As a consultant, I regularly walk into situations where the supposed technical hero of the organization is actually a villain. More often than not, it falls into one of these two buckets:
1) The Peter principle: Someone was given more responsibility than they are actually competent to handle. Easy to happen if you have non-technical management over technical talent.
2) A kingdom builder: Someone technical who enjoys the power in being a gatekeeper for the organization.
An incident post-mortem should not be about blame, but rather establishing the facts of what happened. Sometimes your technical team knows better. Especially in early stage companies with limited funding, corners get cut. It’s critical to listen to your technical team when they suggest areas of business risk. It’s also critical to ask them to identify those areas so that they can be preemptively addressed, or at least identified. Learn whether your heroes really are, and if they aren’t, take action to improve your organization.