Our world seems to hide magic in the mundane. There are no grand displays, glittering lights, or feats of inhuman athleticism. Instead of magic, we have small acts that, when applied diligently, with discipline, compound into magical results over time. Perhaps one of the best examples of this is compound interest. Take a bit of money invested consistently, combine that with an instrument that provides a decent return, and add the magic ingredient of time, and you can realize pretty incredible results. For example, take one hundred dollars as your initial investment, add one hundred dollars every month in an instrument that returns ten percent (probably some market, not your savings account), and you end up with a growth curve that looks like this.
Pretty impressive, right? Far too many individuals don't take advantage of the magic of compounding interest. However, a few individuals do, and they reap the rewards that knowledge and patience bring.
Organizations have their own equivalent of compound interest, but all too often, they fail to take advantage of it. That is processes that iterate on themselves via Root Cause Analysis (RCA). Iterating on a process starts when you have a process that is generally working. Something bad happens. You investigate why that 'something bad' happened via an RCA. That RCA discovers the root cause and recommends both immediate fixes for the problem and changes to the process so that the same 'something bad' doesn't happen again. RCAs are a magical tool for organizations in the same way that compound interest is a magical tool for financial growth. The magic ingredient is the discipline to run the RCA consistently and apply the RCA results to existing processes.
However, it doesn't work if you don't have processes or if your processes only exist in the heads of those that run them.
Processes as Rails
Let's imagine you live in a great neighborhood. You are in a cul-de-sac of a cul-de-sac. Your kids and your neighbor's kids all play together. It's so good that one of your neighbors made up a song about how great it is and how you can keep it great. It was catchy, like one of those marketing tunes that gets stuck in your head from time to time. No one ever wrote it down, but anyone in the neighborhood can sing it at the drop of a hat. One day, you are sitting in the cul-de-sac with a few of your neighbors, and you decide to change the song. You come up with a few new verses and some changes to existing verses. Everyone gets excited and agrees that it's so much better. Later in the same week, everyone gets together and starts singing the song. The folks that had drinks in the cul-de-sac earlier in the week sing the new version of the song. The folks that were not there sing the old version of the song. The singing immediately becomes discordant. No problem. The folks who know the new song teach it to those who don't and very quickly reach harmony again. That is, until the fourth of July celebration, where even more neighbors come out. The situation repeats. Eventually, everyone knew the new song, but it took a year or two and a lot of discordance at the beginning of gatherings.
Processes are a lot like those songs. If your processes live in people's heads, they are tough to change, and those changes take a very long time. RCAs are a tool to change processes, but they don't work very well if you don't write the processes down. So, document your processes. When you document them, put that documentation in a special place for curated, important content. Don't just drop them in the company wiki with all the other outdated random crap there. Otherwise, they take on the value of the crap that exists wherever you put them. When you hire someone new, include process review as part of the new hire onboarding. When someone asks about the process, point them at the documentation rather than telling them about it. In other words, live the process in your organization so that changes and improvements to those processes are meaningful.
Getting to the Root of the Problem
There are a million ways to structure an RCA and format the documentation that results. A quick google will provide several ways to run an RCA meeting. Confluence has a great template by OpsGenie already built right in for documenting the results. Regardless of the approach, you take. First, you must include a Five Whys Analysis. To do that, get everyone involved in the problem into a room and dig deep using the Five whys method. Next, you must go to the fifth why. Let me repeat that, YOU MUST GO TO THE FIFTH WHY. Laziness is going to tempt you to stop asking why after the second or third why. You will feel like you have already gotten the right and answer, and there is no need to go further. You are wrong. In literally every case, the most interesting and impactful solutions come in the fourth of the fifth why. Those are going to be the answers that change your processes and your organization for the better.
There is a slight variation on five whys that I highly recommend. That is the Why Tree. Complex problems seldom have a single answer. When you ask, 'Why did X Happen?'. There is almost always more than one answer, and each of those answers is often equally plausible. 'Why Trees' take this into account. They allow you to have multiple answers for each 'Why' question, and with each of those answers, create a new branch. This technique does not get you out of the requirement that you go at least five 'Whys' deep. However, it does give you much broader coverage of a problem, and the follow on solutions tend to be more coherent and impactful.
Do the Work
Doing RCAs is expensive. It takes time from every single person that could be part of the problem. It takes a significant amount of time from the person leading the RCA itself. You waste this time if you don't make any of the changes that the process recommends. RCAs often generate quite a bit of new work. That work doesn't advance the product. It's not a new feature. It's not reinvestment that drives down costs. It's not anything that any stakeholder wants. It is simply something that you need to do. For the magic of RCAs to work, you have to spend the time to execute the results. To integrate and productize the improvements. Otherwise, you are like the saver who invests in a savings account that returns ten percent interest but only leaves it in the account for a couple of years. The magic of compounding interest never has time to take effect. If you don't do the work discovered by RCAs, the magic of RCAs applied to processes never results.
Do it Now
RCAs have a half-life. That half-life is somewhere between one and three days. That is the window of time after an outage that you have to do the RCA in. If not, you start losing the metrics. The people involved start forgetting. You very quickly lose your ability to do a coherent and complete RCA. That means that the RCA has to be a priority, trumping all other work except an outage.
RCAs for Everything
RCAs are not just for Product Engineering Teams having an outage. They are for any process that can fail. For example, if you project delivery timelines and you fail to meet those timelines, it's worth doing an RCA on why. You can roll what you learned from that RCA into your estimation and planning process. If you hired someone and that person left after a month, you can do an RCA on why and roll what you discover back into your interview and management processes. RCAs combined with processes are a combination of a swiss army knife and a magic wand that give you deep insights into improving your processes. Use those insights to improve your processes and build RCAs into every aspect of your organization.
RCAs for Everyone
RCAs are extraordinarily valuable if you are a CTO, VP of Engineering, Director, etc. implementing these for your organization. However, RCAs are useful at any level, from ones that an engineer runs on themselves, to RCAs that a manager might drive just for their team. They are a tool of iteration improvement and as we discussed in this article, nearly anything that can be defined can be improved.