Your System is Not Perfect and That's Great — Balancing Operational Investments
Investments in operations needs to be the minimal level which maintains the systems in an acceptable state forever. Because no one likes a show-off.
This article is about investment in operations. My history in operations is in the tech industry, on software development teams. Yet, balancing investment in your current systems versus investment in new business is a challenge for leadership everywhere.
On a personal note, our family closed on a new home over the weekend. This means I've spent the last few days carrying boxes, setting up utilities, and being otherwise distracted. While I normally send this article on a Monday, I quite literally could not find all the parts to my computer in time to get it sent.
This newsletter has great leverage. A manager reads how to give feedback to their employees, and it improves the careers of their team members. Someone reads about how to ask for a promotion, and they begin down the track towards personal growth. When someone reads an article about clear communication before their interview, they could gain a position worth hundreds of thousands of dollars a year. If you've gained value from this newsletter, I'd love for you to spend a tiny percentage of the potential value by becoming a paid subscriber. Join the hundreds of others who support this community! You get an extra article every week, and support the learning of the many thousands who read these articles weekly.
When I started at Amazon in 2007, I began in the Global Payments organization. During our first team meeting, I remember the engineers complaining that we weren't allocating enough time to improve one of our older systems.
After a couple of years, I moved to Marketplace to run Seller Central. The engineers there were quick to point out that our systems needed some serious investment in our operations. We had been neglecting those older systems.
I moved to Facebook after a few years. Things were a bit different there. The engineers generally picked what they wanted to work on. Interestingly, that meant that cleaning up code wasn't something many people picked. The most common complaint from engineers was that someone needed to spend more time cleaning up the code. They needed to invest more in operations.
AWS, same thing. Devices, same thing. Games, same thing. It turns out that I've never heard of a single successful engineering team which was happy with the amount of time they spent on the operations of their systems.
I began my career as a software engineer, and I spent a significant amount of time as an engineering leader advocating for investing in operations.
I'm going to explain why it's healthy to be unhappy with the state of your operations.
Why care about operational excellence?
There are two major categories of value you get from investing in operations.
Customer experience: Customers expect your systems to operate properly.
Engineering investments: Fragile systems are expensive to maintain.
The arguments for investing in improving your operations at a high level are clear. You have customers, and they have expectations. Those expectations include systems which continue to work, and orders which continue to be fulfilled. Engineering teams have new software to build. If they're stuck restarting machines all day, they won't be building new software. They'll also probably quit.
I've had junior engineers ask about the clear discrepancy between value and investments.
"Why is leadership so stupid? Can't they see we should invest more time in our operations?"
I respond that you have two choices. You can believe that everyone senior in the company has lost their minds, and doesn't understand how to run technology. Or you can believe that you work with intelligent people, and if you don't understand their choices, you are likely missing something.
Why you should minimize your investment in operational excellence
Some investments grow a business, and some do not. When you spend a million dollars advertising, you gain some amount of new business. If you invest another million dollars, you gain yet more business. As long as the math makes sense, you continue to spend more money.
Operations work differently. When your systems are down half the time, you will lose all your customers. However, if your systems are down only 0.1% of the time, customer reaction depends on the type of business you're running. If you're running an AWS system, you may need to limit your downtime to 0.0001% of the time before your customers cease to care deeply enough to move on. On the other hand, if you're running a backend system which sends emails, you may be able to regularly break without anyone knowing or caring.
You don't gain customers by having better operations, with some rare exceptions. For the most part, your systems work well enough for customers, or they don't. There is no prize from customers for having the best codebase.
It's counter-intuitive, particularly to engineers who look at quality as a virtue. And I won't deny that quality is a virtue, but one which does not give a return on investment after a certain point.
It comes down to understanding your customers, and the impact of operational issues. Good operations are necessary. Great operations are better. Perfect operations don't matter.
Making matters worse, improving operations tends to get harder the better your operations are performing. If your system completely breaks every hour, the fix is likely to be obvious and easy to spot. If your system breaks every billion transactions, it can be incredibly challenging to debug.
The 80/20 rule works in operations as well as other aspects of life. The final 20% of issues take 80% of your effort. Perhaps more accurate would be to say that the final 1% of issues takes 99% of your effort.
Balance investments
You need to invest some money in improving your operations, yet too much investment is a waste of time. How do you get investments for operations in general?