Stoic Software Leadership: Dependencies and Negative Visualization

Kiefer Hagin
4 min readNov 26, 2021

“It’s not what happens to you, but how you react to it that matters.”
Epictetus

In modern software development, we are surrounded by a myriad of potential events that we cannot control: managed cloud environments becoming unreachable, sudden bursts of customer usage, zero-day exploits, star engineers jumping ship for the shiny new start-up, global pandemics…the list goes on. As an aspiring Stoic, I thought I would share one way that I integrate my philosophical practice into my software leadership role and help my teams focus on what they can control.

When developing software at a rapid pace, it is easy to lose sight of the spiderweb of potential failure modes we are introducing into our SaaS environment. In my role as a software engineering leader, I believe an important part of my job is identifying dependencies with failure modes that are not under my control, and rationally prioritizing initiatives that help my team defend against those potential failures.

The Dichotomy of Control

“Events do not just happen, but arrive by appointment.”
Epictetus

Stoics adhere to the notion that there are two segments of circumstances in the universe: those that are up to us (within our control) and those that are not. The segment containing things that are not within our control is much, much larger.

What is in our control is how we prepare for the events that are not. If we start to think about all of the pieces of software that we integrate with on a daily basis that keep our business running smoothly, we can start to see how large that bucket of potential failure states and their effects can be.

Negative Visualization

“We cannot choose our external circumstances, but we can always choose how we respond to them”
Epictetus

So, how do we best prepare for circumstances that are outside of our control? One Stoic practice that may help us is “Negative Visualization.” We play the tape all the way through, assuming the very worst case scenarios, and identify the areas where preparation and action (what we can control) might make the worst case scenarios more manageable and less risky. We stop taking the stability of our dependency graph’s weak points for granted.

I can start by documenting all of the critical dependencies I have in my stack. People, infrastructure, cloud environments, on-premise instances, my personal laptop, anything that I can imagine. For each, I can walk through the worst case scenario if that dependency were to fail.

There are many different failure modes for most of these dependencies. An employee can get sick, or they can leave the company, or they can get hit by a bus. Some critical service in AWS can have a 5 minute blip in reachability or it can go down for multiple hours. The goal is to account for realistic terminal states of failure and have a plan for what you can do if the dependency enters, or is approaching, one of those states.

“Regardless of what is going on around you, make the best of what is in your power, and take the rest as it occurs.”
Epictetus

At some point, we will find that the level of preparation necessary to fully eradicate the negative effects of a potential event is too costly or burdensome. The point of this exercise isn’t to bring every possible event under our control, but rather to minimize, reasonably, the potential damage, and understand where they lay.

In my experience, preparation for external circumstances usually looks like one of two activities: creating redundancy, or enacting proactive behavior. As an example, I can’t make every employee at my company stay forever, but I can do my very best to create a work environment where most people want to stay for a lengthy period of time (proactive behavior), and ensure I cross-train folks so that no one is a knowledge silo (redundancy).

I can’t ensure that my third-party email service will never go down, but I can add logging, dead letter queues, and other defensive measures to try and mitigate the impact (redundancy). I can add alerts and alarms so that I know ahead of time and can communicate to my customers and internal stakeholders (proactive).

Prioritize

Once we have a list of preparations or improvements we could reasonably make, we can start to prioritize them and put them into action. We start with single points of failure that are either highly likely to be affected or would be crippling if they were affected. We work down the list, being careful not to spend time on things that have little or no impact. It is almost always better to spend more time on something critical than to account for every eventuality.

Conclusion

Stoic practices make for fantastic common-sense leadership exercises, and leading software teams is no exception. By digging deeply into our dependency graph across our organization and visualizing worst-case scenarios, we can quickly see where we need to improve and create a list of initiatives that can bring our team, our customers, and our stakeholders a little more peace of mind.

--

--

Kiefer Hagin

VP of Engineering @ Waypoint. Practicing Stoic, musician, and coach.