Butterflies, blizzards, organizational change, cake, and the Great Red Spot of Jupiter. (And more cake, please.)

Dave Nicolette, January 29, 2009

You ever hear this one? "No matter what we try, nothing really changes." Yeah. I've heard it, too. Let's talk about it in a minute. In the meantime, here's something cool:

We're living in an era of debate, experimentation, and rapid evolution of methods in software development and in project and program management. A range of philosophies, process frameworks, methodologies, and bags o' tricks are in use (or abuse) as companies try to find solutions to problems that may or may not exist and that may or may not be properly understood. Everyone seems to have their own pet solutions and, in some cases, particular personal hates, too. And everyone is so certain about those pets and hates. Everyone "knows."

Well, almost everyone. I don't "know" everything. I don't have a single, simple answer that will solve all problems in all environments, circumstances, and domains. But there is one thing that all the various ideas in play appear to have in common: They take a mechanistic approach. What I mean by "mechanistic" in this context is there is a general assumption that if we can just find the right process, and then if we can just follow that process to the letter, "it" will guarantee success for us. We can put our brains in jars on a shelf and just trust the process. In my time I've heard more than one manager say, "We need to find the perfect process so that I can hire Anyone Off The Street and have them deliver software successfully."

In the magical Land of Oz, we can achieve success in our work in the same way as we follow a cake recipe. Just measure out the specified ingredients, set the oven to the specified temperature, set the timer to the specified time interval, and when the bell rings you'll have cake. It works every time. It works for chocolate cake, yellow cake, carrot cake, spice cake. It works in New York, Caracas, Osaka, Milan. There's a recipe for software, too. It works every time, for every type of software in every context, everywhere. It's as easy as cake. Mr. and Ms. Anyone are now off the street and delivering software like clockwork (or clockwork oranges, anyway).

It's funny, but whenever I characterize a proposed process improvement initiative as "mechanistic," its proponents insist that they are cognizant of the importance of human factors. My rule of thumb for assessing their view of humans in an organization is to see whether they use the term "resources" to describe people. A "resource" is some sort of machine or machine part, like a server or a stapler or an ink cartridge. I often find myself in the role of a sort of Joseph Merrick of software development, having to assert, "I am not a resource. I am a human being!"

Granted, a couple of the conceptual frameworks that are being used today do take human factors into account; namely, lean and agile. But even in those communities there seems to be a goal (or perhaps just a hope) that we can arrive at a formula that will work over and over again in any context and in any domain, just like a cake recipe. I don't "know" much, but I'm pretty sure a mechanistic approach will never work as long as humans are involved. Any human organization is a complex system.

"No matter what we try, nothing really changes. We've tried:

You say: Well, lots of things are "complex." That doesn't mean they aren't manageable.

I say: True. When I say "complex system," I'm thinking about a model like the one described by Sholom Glouberman and Brenda Zimmerman. They have this notion that systems fall into three broad categories, loosely speaking: Simple, complicated, and complex. The cake recipe approach works well for simple systems. It's possible to come up with a repeatable series of steps that can be guaranteed to yield success in all (or nearly all) cases. Complicated systems are characterized by larger scale, more significant coordination or logistical challenges, and/or a need for specialized expertise. They are less predictable than simple systems because of their many interrelated subsystems, but they are still amenable to formalized, repeatable processes. It may take more time and experience to develop a suitable process, but at least it is feasible to develop a process.

Simple Complicated Complex

Then there's that third category, complex systems. Now we're getting into a level of complexity that's a bit harder to rein in. It's like the overall weather system of planet Earth. It's really hard to predict what's going to happen outside of a relatively limited geographical area and time frame, except at a very high level of abstraction. Colloquially, some people like to talk about the Butterfly Effect. If a butterfly in Brazil decides to turn left instead of right, it might have some effect on the next blizzard in Siberia. Predicting that effect, measuring the outcome to test the prediction, and being sure it will happen again the same way every time...well, that's kind of hard. It has something to do with Chaos Theory; "sensitive dependency on initial conditions," or something like that. Not my field, you know. I do understand, however, that in reality butterflies can't change weather patterns, flap and flutter though they may.

The little changes we try in our organizations — like TQM, Six Sigma, Agile development, and so on — can't really change the overall conditions in the organization, either, flap and flutter though we may. Organizations have a sort of equilibrium. To change them fundamentally, we have to disrupt the equilibrium. It isn't sufficient to introduce some sort of process improvement program. That's no more effective than a butterfly's wing.

Maybe we can illustrate this by looking at a somewhat simpler weather system than the one on planet Earth; one that's writ large enough for us to read at a distance. Then maybe we could observe how certain long-lived features can form and persist in the atmosphere, and use that as a sort of conceptual model for other complex systems, such as human organizations. Hmm. Now, where might we find such an example? ;-)

The Great Red Spot has been a prominent feature in Jupiter's atmosphere for at least 300 years. That's when humans first noticed it. The main reason a hurricane can last such a long time on Jupiter is the planet's rate of rotation. Although it's substantially larger than Earth, Jupiter's day is about 10 hours long. Friction from the planet's rapid rotation has set the atmosphere into relatively fast motion. It has settled into distinct bands that travel around the planet at different speeds. The regions at the boundaries of the bands are highly turbulent and give rise to rotation quite frequently. Storms can't move north or south; they're constrained within the band where they formed. So, when two storms meet, they have to compete for space. One storm eats the other. The Great Red Spot has been consuming other hurricanes that come near it for hundreds of years. Trapped within one band of the atmosphere, energy from the boundaries with neighboring bands keeps the Red Spot turning. We might say it's in an equilibrium state created by the forces that surround it. Whenever it's challenged by another storm, it simply absorbs the other storm and keeps on churning. Other spots — some red, some not-so-red — come and go. The Great Red Spot lives on, unperturbed.

Organizations are like that, too, in a way. There's an overarching flow to the way things happen in the organization. Round and round it goes, always the same. When we introduce any sort of process improvement, it gets gobbled up by the larger rotation and destroyed, like a storm that ventures near the Great Red Spot. But there are no humans on Jupiter, and no fast-moving atmospheric bands in the typical business enterprise. What holds these long-lived, stable work flows in place? What makes them so hard to dislodge?

As I said, I don't "know" much, but I have a idea of why that may be true. As an organization grows larger, people find it harder to coordinate their activities to achieve business goals. To cope with this, they create standard procedures to guide the interaction between different working groups. As the business grows and greater demands are placed on the staff, each working group has to make more and more requests for the services of other working groups. To cope with the increasing load of incoming requests, each working group creates its own standard procedures for others to follow when requesting services of them. (You lean thinkers out there will recognize this as a form of local optimization. I think you can see where I'm going with this. I'll keep writing anyway, for the benefit of the "heavy" thinkers.) The effect of all those new procedures, which aren't really "standard" since each working group invents their own, is to make the overall clumsiness of the organization even greater than it was before. That is, the treatment makes the disease worse, which prompts the doctor to increase the dosage, which makes disease even worse, and so on.

Now, when we introduce a process improvement idea, it tends to fall into just one or a few working groups within the organization. Say the company organizes a Six Sigma initiative. They form a new team of Six Sigma people, most of them freshly-minted Green Belts but (hopefully) including at least one Black Belt, and they set out with great enthusiasm and high hopes to fix the organization. To do so, they must request extra work from many of the working groups in the organization. Those working groups respond to the additional requests for their time in the way they always have: They thicken the walls and widen the moats around their part of the organization. Ultimately, the Six Sigma team becomes yet another working group competing for slices of everyone else's time. By the time the dust settles, the problems are worse than before and the organization's equilibrium remains unchanged.

So people start to say, "We tried Six Sigma and it didn't work. What shall we try next?" Let's say, for the sake of discussion, they decide to try Agile Development next. That tends to fall into the software development arena. In most organizations, software development has close dependencies on other working groups such as business analysis, quality assurance, information security, data management, network services, enterprise architecture, and others. Since they aren't involved with the Agile Development initiative, they respond to the change in the way they always have: They thicken the walls and widen the moats around their part of the organization. Now there are even more different kinds of working groups and even more "standard" procedures for everyone to follow when they try to get any work done. So people start to say, "We tried Agile Development and it didn't work. What shall we try next?"

Hey, here's an idea: What if, instead of just repeating the same pattern with a different set of buzzwords, we try to identify the forces that are holding the organizational dysfunction in equilibrium, and then disrupt those forces?

You say: If it were that easy, someone smarter than you would have thought of it already.

I say: It isn't a question of how smart they are. It's the age-old phenomenon of the blind men and the elephant. Each blind man encounters just a portion of the elephant, and based on that experience he thinks he knows all about elephants. All the blind men are right, as far as they can see. Each figures he understands this complex system, this elephant, and if he can just create a predictive model based on that understanding, then others will be able to deal with elephants just as effectively.

Another popular term for the phenomenon is retrospective coherence, also known as 20-20 hindsight. We've seen something work well, so we figure if we can just duplicate it then we can enjoy the same success again and again. Joseph Pelrine has an entertaining way of explaining it. He tells it better, but here's my crude summary. Say you had a party, and it went really, really well. Everyone had a great time. You enjoyed it so much, you'd love to repeat the experience. So you try to remember everything you can about the party. The day of the week, the time of day, what food and drinks were served, who was present, what the conversations were about, which games people played, what music was on — as much as you can remember. Then you try to make all the same things happen again, exactly as before. You tell the guests to wait until a specific time and then to engage in a specific conversation. Then they must eat a specific food item. At another specified time, they must play a specific game. They must do each of the things that happened at the first party. That way, it just has to be as much fun as the first party was. It's mandatory that the second party is just as much fun as the first. The problem is that the success of a party has a sensitive dependency on initial conditions (or something like that).

I think something akin to retrospective coherence is behind the parade of self-styled experts who think they have a simple solution for complex organizational problems. Like the blind men, each has enjoyed a positive experience solving similar problems in another organization. They assume that if they can just duplicate as much of that previous experience as possible, they can achieve the same positive results in other organizations, as well. It's mandatory that it should work!

Another handy buzz-phrase that may apply to the blind man approach to organizational change is ontological myopia. This is the phenomenon whereby we tend to see only a limited range of possibilities based on our personal past experience or based solely on familiar models. Some of the self-styled experts who "know" The Answer just can't see that there are many other possible solutions besides the one (or the few) they have experienced. In fact, the concept of ontological myopia may lead us to recognize even deeper complexities in these complex systems that are human organizations. According to Dave Snowden of the Cynefin Centre, as described in a paper entitled "A new perspective on culture," there are a couple of assumptions underlying much of "management science" —

Firstly, any organisation is a system in which cause and effect relationships exist and are knowable in such a way that we can create predictable and empirically verifiable models of the behaviour of the system. This is an ordered ontology.

Secondly, organisations are aggregations of distinct and autonomous individuals who assemble into collectives on the basis of a rational assessment of some anticipated return and whose motivations can be managed through incentives and penalties.

And...

...the nature of the ontology determines the epistemological possibilities; translated, the nature of the system determines the nature of the way in which things can be known.

You say: Well, that sounds reasonable, but how can we possibly "know" cause and effect relationships in a complex system? How can we identify changes that will change the equilibrium, resulting in sustainable, positive change? The tools we normally use for root cause analysis, like fishbone diagrams and swimlane diagrams, don't come close to modeling the organization as a whole. And if you're correct in thinking the organization is truly a complex system, then we won't be able to determine cause and effect relationships any more accurately than we can determine the effect of a butterfly's wings in Brazil on a blizzard in Siberia.

I say: We don't have to model every detail. We don't have to try and convert the complex system into a simple one just so that we can write a cake recipe for it. Nor must we resort to the opposite extreme and surrender to an incomprehensibly-complex system that can't be analyzed. Dave Snowden helps us with the problem by identifying three ontologies: Ordered, complex, and chaotic. An ordered ontology has straightforward cause-and-effect relationships that can be known and about which predictive models can be made. With respect to the Glouberman-Zimmerman concept of simple, complicated, and complex systems, we can say that simple and complicated systems are characterized by ordered ontologies. We needn't worry about chaotic ontologies because an organization with that characteristic would not be sustainable as a commercial entity. So, to understand the dynamics of the Great Red Spot of our organization, we can approach it as a complex ontology. Snowden describes a complex ontology as follows:

...the nature of [the agents] and the number of interactions are such that cause and effect relationships, although they exist, can only be understood when they have stabilised: [they] are subject to retrospective coherence. Managing in a complex space is more like managing children. The volatility of the relationships and interactions is such that all that can be done is to manage patterns: patterns that we want we stabilise...and when we get very clever we stimulate the interactions and agents in such a way that desirable patterns are more likely to form.

You say: You call that "help?" He's basically saying there's no way we can understand all the cause and effect relationships in a system as complex as a human organization. He's saying everything we thought we knew about management science amounts to a couple of misconceptions. It means there's no solution!

I say: Bear in mind we are working in a limited domain. We don't have to solve the general problem of complex systems. We only have to find practical ways to improve the effectiveness of software delivery in the context of enterprise information systems.

Let's think about the Great Red Spot analogy a bit more. As the storm system started to rotate, its edges pushed against the surrounding atmosphere, creating smaller eddies that turn in a contrary direction. Those eddies are driven by the same convection currents that drive the main one, so they picked up speed. Since they're smaller than the Red Spot, wind speed within them is higher. Thus, they push back against the edges of the Red Spot, holding it in place and giving it a consistent shape and size.

To bring the analogy back home, the Red Spot represents the overall flow of activity in the organization. Surrounding it are the locally-optimized "standard" procedures that each working group has established to protect itself from the general organizational dysfunction. Like the eddies pushing against the edges of the Red Spot, these procedures exacerbate and amplify the broader organizational dysfunction even as they attempt to mitigate that same dysfunction. They provide the forces that establish and sustain the equilibrium of the organization's overall flow of activity. That's why making localized changes never results in deep or sustained organizational change. When we change the software development methodology used inside the software development group, we don't change the interaction between that group and the rest of the organization. When we add a Six Sigma working group, we simply add yet another localized group that adds its own forces to the edges of the main flow. Small changes are simply absorbed, just as smaller storms are absorbed by the Great Red Spot.

There's a handy little tool from the world of Systems Thinking called a Diagram of Effects, sometimes also known as a Causal Loop Diagram. The interesting thing about this tool is that we can use it to tease out the cause and effect relationships that matter for our purposes, even if we stop far short of identifying every flap of every butterfly's wings.

In a complex system, any given event or situation may be related to a second event or situation as both a cause and as an effect. As the Great Red Spot came into being, its rotation caused smaller eddies to form around its edges. As those eddies gained energy, they caused the Great Red Spot to hold its shape and size. Likewise, the defensive behaviors of localized groups in an organization are both effects and causes of the general organizational dysfunction.

The Diagram of Effects helps us identify these relationships. Donald Gray offers a mercifully brief description of the tool, and Nynke Etk Fokma has a slightly longer and more entertaining one. Here's an example of a Diagram of Effects. (It has nothing to do with the text of this article. I just grabbed it from a website to use as an illustration.)

The little cloud shapes represent things that are quantifiable or measurable. They need not be things that anyone has actually measured; just things they can be measured or compared in some way. The arrows represent influence. A plain arrow denotes an influence that increases the second thing. This particular example doesn't show any negative influences, but if it did they would have little minus signs next to the arrows. In that case, a thing causes less of the other thing to happen. We might find, for instance, that an increase in the number of production support tickets causes an increase in the amount of attention paid to QA testing; an increase in the amount of attention paid to QA testing leads to a reduction in software defects; and a reduction in software defects leads to a reduction in the number of production support tickets.

There are two types of causal loops that are of particular interest for our purposes. A balancing loop is one that is in an equilibrium state. For instance, if A increases B, B increases C, C decreases D, a decrease in D increases E, and an increase in E decreases A, then the causal loop may be sustained indefinitely. The production support example above is a balancing loop. Now assume that there's a smaller causal loop that goes like this: F increases G, G increases A, an increase in E increases F. As long as things go on this way, F keeps increasing. This is a reinforcing loop.

The way I visualize this is that the Great Red Spot is like a large balancing loop, and the smaller eddies around it are like reinforcing loops that feed effects into the main balancing loop. In an organization, the large balancing loop is the overall flow of activity in the organization, and the smaller reinforcing loops are the locally-optimized procedures each working group has defined to control (discourage?) interactions with other working groups.

You might find, for instance, that problems in getting clean code out of the development group prompted the quality assurance group to introduce heavier-weight procedures for promoting code to QA. Having to deal with those new procedures led the development group to add new procedures of its own to "force" the QA group to be more specific about what they were willing to accept. Because this increased the workload on the QA group, they designed new procedures to inconvenience the developers even further. And so the dysfunction evolved, tit for tat.

If you make a single, localized change, it won't fix the organizational problem. One obvious change might be to get the developers to use software engineering techniques that tend to reduce defects in the code, such as continuous integration and test-driven development. If you don't also make changes in the QA group, they will continue to demand that the developers follow inconvenient, heavyweight procedures to promote code into QA. This will cause the developers to retaliate with new procedures of their own, and here we are right back where we started.

When we make localized changes only, and we fail to address the forces that hold an organization's overarching balancing loop of activity in place, we achieve results like this:

It's a photo of the impact of comet Shoemaker-Levy 9 Fragment G on Jupiter, taken at the Mount Stromlo and Siding Observatories, Australia, on July 18, 1994. The impact of fragment G alone generated a force equal to about 250 million megatons of TNT and created temperatures of more than 50,000 degrees Fahrenheit. The smallest of the other fragments struck with a force of about 11 million megatons of TNT. The Great Red Spot was unaffected. To affect the Great Red Spot, you'd have to lengthen the rotational period of the planet.

In the hypothetical example of the development group and the QA group, the dysfunction manifests at the interaction boundary between the two groups. Changing the internal processes within either or both those groups does not change their interaction. Changing the interaction between development and QA does not change the interaction between business analysis and development, or between information security and enterprise architecture, or between the IT department and the lines of business. Those changes amount to just so many comet fragments. They are absorbed and disappear without changing the organization in any fundamental or lasting way.

When we start to monkey with the interaction boundaries between working groups, we're flirting with organizational structure change. That may actually be the right thing to do. After all, those tit-for-tat defensive behaviors between the hypothetical development group and the hypothetical QA group wouldn't exist at all if we simplified the hypothesis and had everyone in the same group. Come to think of it, why are they organized in different groups, anyway? When we can answer that question, we'll have insight into the kinds of changes that might actually make a difference.

Present-day organizational structures are a result of the evolution of conventional processes and practices. Retrospective coherence causes us to assume these structures are the only ones or best ones possible. When we try to change processes and practices but we leave the enclosing structure unchanged, it's likely that the new processes and practices won't fit comfortably inside the old structure. After a certain amount of painful rubbing against the walls of the enclosure, those new and improved processes and practices are liable to atrophe. Maybe it would be a good thing to change the structure so that better processes and practices can thrive. Now, that's a whole 'nother can of worms than just introducing yet another new process framework.


Illustration sources: