February 8, 2025

Embracing Complexity in Software Development

The world we live in and therefore any software we develop is non-negotiably complex. Even a product believed to be simple can be deployed into a complex environment resulting in unforeseen consequences. The products we’re building and the environments we’re deploying to are becoming more complex, more interconnected, and are changing constantly.

Complexity in software development creates risk: risk of unexpected and undesired behavior; schedule risk; risk of a product failing to do what it was designed to do; risk of exposing sensitive data; risk of impairing or even destroying other elements in its ecosystem; risk of things going wrong in ways that we can’t even imagine.

A spectrum of consequences result from this unwanted behavior. From the inconsequential to the nuisance to the destructive. Data loss, loss of business, loss of property, impairment to human health or worse can result from inadequately perceived and poorly managed complexity.

By being honest about complexity and committing to perceive and respond to it, it’s possible to manage it. The word “manage” here does not mean eliminate. Manage means observe, understand, respond and anticipate. The risk of unwanted behavior never goes away. It simply waits until something changes, and things are always changing.

The Myth of Simple
It’s important to embrace a perspective that accepts, perceives, and is curious about complexity. As an engineer, this is a day-to-day and lifelong journey. One could say that the biggest enemy of successful product development is reductionism - failing to anticipate or even admit how complex a situation is. We want to believe things are simpler and more known than they are.

As developers we should strive to be dispassionate, curious scientists. Adam Grant, author of Think Again, refers to “the joy of being wrong”. We need to be willing to let go of what we believe - willing to be wrong - putting aside our ego and welcoming a larger and more accurate view. This allows us to accord our development with the nuances of what we’re actually observing, not what we wish to be.

Heuristic Traps
In 2002, Ian McCammon wrote a groundbreaking study of the psychological factors that lead backcountry skiers to get trapped in avalanches. He called these factors “heuristic traps” or mental shortcuts that can lead to errors in decision making. These traps are surprisingly applicable to mistakes one can make in managing software development.

“Familiarity” is our tendency to continue to do things the way we have done them in the past because those decisions worked for us before. We ignore new data and repeat decisions. The antidote to the familiarity trap is “beginner’s mind”: be disciplined about objectively measuring system behavior and remain curious about it. We need to test our assumptions and in order to do that we need to be aware of those assumptions! This is not easy.

“Social proof” is our tendency to believe that if others are doing it, it must be ok. The antidote to this herd mentality is being an independent thinker and speaking up when things don’t seem right to us. Disagreement, challenging group think, admitting when one is wrong, or that we “don’t know” should all be supported in a healthy development team’s psychology and culture.

“Commitment” is the inertial notion that since we’ve invested significant energy in a given effort, design or process it must be correct. This is difficult to counter, especially in a larger project with a lot of people working on it. It’s even more pronounced the closer we are to delivering. The antidote is testing, honestly evaluating the results of that testing and remaining responsive to what the testing is telling you.

“Scarcity” distorts the value of opportunities we perceive as limited. This trap is familiar to any development team working at the receiving end of the “time to market” whip. We incur technical debt to get the product out faster which subverts quality and increases risk. The antidote to scarcity is sober project management, a shared vision about the complexity of the problem domain, and respectful rapport between marketing, engineering, and the business folks making the decisions.

Tricks of the Trade
Number one in the management of complexity is a commitment to observability, measurement and metrics. Recorded, impartial observations are vital. Put effort into organizing the stats flowing from your testing automation frameworks, CICD, daily builds, deployment telemetry etc. Don’t let that data just sit there. Bring it to life! Transform those stats into graphics that tell a story. Create alerting criteria. Data visualization allows your team to perceive complexity more intuitively. Strive to convert stats to graphics that even non-technical people can understand. Prioritize intelligent instrumentation.

Continuously test and continuously update testing as defects are found, especially after they’re fixed. Regression cannot be overemphasized.

Require engineers to test and put tools into their hands that support that testing. Engineers should feel a personal stake in meeting acceptance criteria and avoid the time consuming cycle of engaging the test team to find and fix simple issues.

Agree and have all parties commit to an acceptable rate of arrival curve for new defects over the lifecycle of the product. Budget for increased entropy at the beginning. Resource the project to observe, respond to, and eventually tame that curve. Publish the rate of arrival to all team members. It’s not lying to you! If it isn’t settling in a reasonable way you can bet on the not-so-fake proverb that if you don’t change direction, you’ll end up where you’re going.

Consider organizing your software into a “backplane” where independently versioned and updateable modules/microservices in the product can be snapped in and out in the field as necessary. Decouple dependencies between modules as much as possible, including where they’re required to run.

Carefully manage updates in steps. Start with a small, low-risk population and slowly increase exposure of the new software to more devices while carefully monitoring system health. Have a well-tested rollback plan and be ready to use it!

Anticipate and emulate scaling, special event, outage, and end of life effects. What happens when 1M devices phone home? 10M? 100M? How does your back end hold up both in capacity and cost from product infancy through a hypothetical wildly successful future? What happens during Black Friday or the Super Bowl? What happens when the backend is unavailable due to outage or end of life? Never underestimate connected products remaining dormant for years or even a decade or more and coming to life for the first time long after the backend support for the product has been EOL’d. What does the consumer experience in those cases?

If we embrace complexity, employ the tools and avoid the traps described above, we can meet the challenge of developing successful software products operating in a challenging connected deployment environment.

References & Additional Reading
  • Grant, A (2021). Think Again. Viking.
  • McCammon, I (2002). Evidence of heuristic traps in recreational avalanche accidents. International Snow Science Workshop (2002: Penticton, B.C.), 244-251.
  • Suzuki, S (1970). Zen Mind, Beginner's Mind. Weatherhill.
  • Reason, JT (1990). Human error. Cambridge, England: Cambridge University Press.