Saturday, June 19, 2010

Software Development is Not like Building a House

My hope is that when most of you see this, you'll say the equivalent of "no shit". You'll continue, "Of course software development is totally different than building a house." But, for those few hold-overs that think software development can progress in sequential phases like building a house, well, this blog is dedicated to you. I will try to convince you otherwise. At least wish me luck.

Take-Away Points

  1. Building non-trivial software systems is a creative endeavor.
  2. Planning will be driven by the realities of the endeavor, just as much as the endeavor will be driven by the planning.
  3. Coding is design, and so are some of the traditional design methods.
  4. Source code is a blueprint (design document); the compiled application is what the user interacts with.
  5. Construction is the compiling or interpreting of the code into machine instructions.

Analogies

I love analogies. I love them especially when you can use an analogy to reduce a complex issue into a simple one, and use it to show why something does or doesn't make sense. On the other hand, sometimes analogies can be powerful, and yet completely wrong. Their simplicity may leave out crucial aspects of what they are intended to describe. Even when an analogy hasn't accurately captured the essence of what it is trying to describe, its simplicity may be so compelling that people accept the analogy as truth. In these cases, analogies are dangerous. They lead us down the wrong path. They allow us to make decisions that are appropriate at the level of the analogy, but yet inappropriate for the actual situation.

An analogy is effectively a model of an unfamiliar or complex concept expressed as something with which we are all familiar, thus making it easier to understand and intuit. As a model, it attempts to express the essence, leaving out unnecessary details.

As with models, analogies have domains of relevance. In other words, an analogy may be useful to explain a concept under certain circumstances. And at the same time, it may be incorrect when used to describe the same concept under different circumstances. A simple example from physics illustrates this idea. Classical mechanics describes motions of bodies well when those motions are slow and the bodies (objects) are not too big or too small. For extremely small bodies or fast moving particles classical mechanics becomes a poor description and the errors can be quite significant when using classical mechanics.

In this case, the analogy that software development is like building a house over-simplifies software development. And, over the years, this oversimplification has lead to many dramatic failures.

Software Development

For decades people have attempted to describe the software development process and create processes to ensure that software development succeeds. And, for decades, people have failed. Practitioners of software development recognize that software development is not one homogenous thing. Rather, the development endeavor can vary dramatically depending on

  1. What is being developed.
  2. The technology choices.
  3. The constraints imposed on the system.
  4. How the system will be used.
  5. The technical and domain skills and experience of those developing the system
  6. And, I'm sure it depends on many more things.

Except for the simplest software systems, software development is anything but a linear process that proceeds along a well-define sequence of steps. Assuming that the intended outcome of the development is well understood, which is often not the case, the way that subject matter is modeled has huge implications on the actual development. By modeling I mean the way the system has been broken into its constituents and how those parts interact. For any given problem, there may be tens to hundreds to an infinite number of different ways of modeling the system. Some models may be better than others for certain situations. Understanding that there are different models requires experience. Developing different models requires skill and experience. And choosing the "best" model requires making trade-offs, skill, and experience. There is a significant artistic component to modeling and design in the same way that there is an artistic component to science. Now I don't want you to walk away with the impression that you can create and select the "best" design before a single line of code has been pounded into the keyboard. That is unlikely, except, again, for the simplest systems. Writing code is an integral part of designing software: which I'll talk about shortly.

In the previous paragraph, I started off the second sentence "Assuming that the intended outcome of the development is well understood". The outcome may be understood. But is it "well" understood? This is unlikely, except, again, for the simplest systems. Let's explore what it means for the "intended outcome" to be "well understood"? Conceptually, it means that we know what we want to build. But here's the rub: we need to "know what we want to build" at a sufficient level of detail that will allow us to design/build the system. And the "sufficient level of detail" is an ill-defined concept. Not only can "sufficient" be vastly different for different people, it also changes as new information becomes available. I may know how to select the appropriate non-linear least-squares algorithm for a particular aspect of the system. And at the same time, someone else may not even know what a non-linear least-squares algorithm is. After implementing the non-linear least-squares algorithm that I selected, we may decide that it takes too long to perform the optimization. And therefore, we may have to choose an entirely different approach. And although you might argue that we should have known the performance issue upfront. But usually you don't know how it will perform until you've built it. You may have developed similar code before and use that experience to guide you. But this isn't always the case. You could argue further that the performance characteristics should have been defined up front. Sure, that is a valid point. And the overall performance may have been defined. But you can't define the performance characteristics for every method up front, because you don't know what methods will be implemented. And again, we are back to figuring out what is the "sufficient" level of detail with which we need to define the performance characteristics.

At this point you should be seeing a pattern. There are aspects of the system for which you don't have solutions up front. As you solve these parts of the puzzle through development, you may find that you have invalidated some assumptions, and introduced new problems, which again need solutions. And again, solutions to these new problems will invalidate some assumptions, and introduce new problems. You may argue that with better planning, you can avoid these "unknowns" that crop up. But the problem with that argument is if you don't know the solution, then how could you ever imaging that you could know the problems that the solution (which isn't yet known) will bring. And furthermore, how your planning have changed that?

Coding is Designing

I believe that the misunderstanding described in the previous paragraphs largely stems from one central incorrect assumption: design comes first, then the coding (or implementation or build). You may argue that by iterating between designing and coding you are not making that incorrect assumption. But I would take it a step further; design and coding are one in the same. This may seem like a Zen koan.

Design is the process of originating or developing a plan that lays the basis for the making of every object or system.

And a plan can be a procedure used to achieve an objective, a set of plans (or drawings) used to provide instructions to build of fabricate a place or object, a blueprint, an engineering drawing, architectural drawing, and so on.

If design is the planning that lays the basis for the making of every object or system, then the way in which we express the design is left to us. For example, you could draw the design on a napkin while having a beer at a nice outdoor pub. Or you could fire up a design program and draw class diagrams, object diagrams, sequence diagrams, use-cases, and so on. And this is what we traditionally think of when we think about design. Now here's the thing: a person couldn't move into the architect's blueprint of the house no more than a user of a software system could use the source code describing how the system performs. In both cases, the designs must be built into a tangible thing that can be used by the person.

Design is the "plan" that allows us to make the object. In the example of a house, it is taking the "plan" (blueprints) and "making" (constructing) an "object or system" (the house) in which the person can live. Similarly, in the example of a software system, it is taking the "plan" (source code) and "making" (compiling/building) an "object or system" (the application) with which the person can interact. To take it a step further, the blueprints for a software system are source code. And the "plan" includes the blueprints as well as the traditional design documents I listed in the previous paragraph.

Planning Software Development

I have worked with people who believe that good planning leads to good execution. And I agree. But I disagree with the idea of developing detailed plans for the entire endeavor at the outset, except for the simplest of endeavors. During planning, we must recognize what we don't know. Our planning efforts should not be wasted on developing detailed plans for things that may never happen. All we gain by doing this are precise plans that are completely inaccurate, and need to be redone. In fact, a good plan should have built within it the ability to change and update the plan.

A good plan, violently executed now, is better than a perfect plan next week.
-- General George S. Patton

The fact is that planning will be driven by the realities of the endeavor just as much as the endeavor will be driven by the planning. Planning and execution are not separable for the development of anything but simple systems. Rather, they are intertwined.

The Mountain of Unknown

To see why planning the entire endeavor from the outset is rarely possible, let's look at an example. Suppose your company has been around for some time, say 30 years. Your financial accounting systems were built to meet idiosyncratic needs, written in COBOL, and over the years have been integrated with many additional systems that feed them data and consume their output. Not an uncommon scenario. These systems were updated in 1999 to solve date issues related to the year 2000. But, many of the solutions were "band-aid" fixes. Newly issued accounting standards require substantial modifications to one of the larger financial accounting modules.

At the outset, we logically assume that the financial accounting module that needs modification can simply be pulled out, and a new one popped in. Software should be "modularized", and everyone speaks about "the accounting module". So why not pull it out? Isn't it just like replacing the bathtub in your home with a nice, new whirl-pool-massage tub with water heating built in? Probably not! Let's examine why we may not be able to just pull out the accounting module and replace it. And keep in mind, to build a new system from scratch, can be much more complex than replacing an existing one.

Connections to Other Systems

Typically, any interesting system has connections to other systems. Systems need to get fed data and other input. And they produce some output that is likely consumed by other systems or people. How, and to what systems, this system is connected will play an important role in its replacement. The problem in many larger organizations may be that all the connections to the system may not be known. This happens, for example, when the system writes output to a shared database or a file. Unless there has been a good process in place to record and manage what systems use what data (database tables, database fields, or files) these connections aren't discovered until the connections are severed—when the system is replaced and stops writing to those sources.

  1. Do you know all the systems that consume this system's output?
  2. Are there scheduling constraints imposed by other connected systems?

Data Exchange with Other Systems

Because the system is connected to other systems, it exchanges data with other systems. How these data are exchanged drives the number, type, and complexity of the data-exchange mechanisms that need to be built.

  1. Are the inputs given to the system through a file, shared database, proprietary communication protocol, method/function call, web service?
  2. Are the inputs formatted in as text, binary, XML?
  3. Are the data handed to the system in bulk, in spurts, real-time, based on events, nightly batch loads?
  4. And you have the same questions for the system's output.

Data Storage

Likely the system you would like to replace stores data and results in some data store. The approach to the data storage will be a big factor in how easy it is to replace the system. If the system has its own, self-contained data store for the data it "owns", and no other systems pull from this data store, then you're in reasonable shape. But, more than likely, the system also writes data into a shared data store as well. And more than likely, as is usual with older system that have evolved over time, other systems read that data, and use that data store as an integration mechanism. Now, all of a sudden, you have to worry about potentially unrelated systems.

Part of a Business Process

Adding to the complexity is that fact that this system will, more than likely, be part of a larger business process. How that process is constructed will, again, be a big driver in the complexity of the replacement of this system.

  1. Does the process proceed through a series of systems, scheduled with some Unix "cron" job, where one system writes data, needed by the next system, to a file or database?
  2. Or is there a workflow that manages the process according to a schedule, or events?
  3. What data is needed by the next system?
  4. And will the new system be able to source that data (this is a real concern for commercially developed products)?

Add-Ons that Don't Belong

Legacy systems tend to accrete capabilities over time. And more than likely, some of those capabilities are not a logically consistent part of the system. This tends to be more of a problem when purchasing commercially developed ("off-the-shelf") products than it is for custom build (either in-house or third-party).

Commercially developed products typically contain capabilities related to the objectives of that product. Additional capabilities that are idiosyncratic to your company are probably not to be found in the commercial product. So what do you do?

  1. Do you customize?
  2. How much customization makes sense?
  3. What about maintenance and updates for the customized product?
  4. Do you know the missing capabilities at the outset?

Lack of Documentation

Finally, in larger, older organizations, there is just simply a lack of documentation that describes the systems, their capabilities, their input needs, their dependencies, where they fit into what processes, and "band-aid" fixes that have been applied to them over the years. The lack of documentation is a big issue. It forces the organization to perform archeological analyses to determine the all the systems and processes that are impacted by this replacement.

  1. When replacing the system, how do you know the answers to the previous five concerns?
  2. Do you have someone, or a few people, with that knowledge?
  3. How complete is their knowledge?
  4. What don't they know?

Take-Away Points

Building, replacing, or integrating a system can be quite complex. There are likely thousands to tens-of-thousands of things to contemplate. Many of them are deeply buried in some undocumented "band-aid" fix, or are a forgotten capability or dependency. Even if you have the developers and business people responsible for the system, expecting them to know all the downstream consumers of the current system's output, is unreasonable. The misguided idea that you can analyze and plan the entire effort at the outset, to a sufficient level of detail to allow execution of the plan will take you down a road of disappointment and failure. Instead, recognizing what you do know and can plan, executing those aspects, while continuing to plan, in real time, as new information becomes available, will yield a higher probability of success. Don't get me wrong. I'm not arguing for jumping in without planning. What I'm arguing for is to start with a possibly imperfect plan, and keep that plan fluid.

Building a House

A common analogy for software development is that software development can be viewed and managed in the same way that you would build a house or a bridge. It amazes me that this analogy has survived to this day. It stuns me that this analogy is still applied to the development of software systems of any real complexity (i.e. most systems). It's like accepting the theory of phrenology in this day and age. Time and time again this analogy has proven inadequate. And yet, software is still developed using processes that are directly modeled on it. Wow!

The idea is that software development progresses sequentially through phases. First there is the planning for the custom build, in which the customer's desires and needs are gathered. What style of house? How many square-feet? How many bedrooms, bathrooms? What type of kitchen? Then, based on the customer's desires and needs an architect begins designing the house. Most likely, this is an iterative process. Once the design is complete, construction commences. During the construction, there may be a number of inspections by the customer and by regulatory agents (e.g. county inspector). Upon completion, the customer and the regulator perform a final inspection, and the house is delivered.

So what's the problem? This seems reasonable enough. Why not just apply this to software development?

Houses are largely the same. In my neighborhood, my wife and I live in one of four types of houses. Sure, my neighbor's house is the mirror image of ours. But effectively, it is the same house. That means, that there are probably thousands, in not tens of thousands of identical house to mine built in the U.S. Our house has been built over and over again. Given the architect's blueprints, the construction crew hammers our one house after another.

Houses have more similarities to each other than they have differences. They all have bathrooms, bedrooms, kitchens, entrance ways, toilets, sinks, bathtubs, windows, floors, walls, and doors. Bathrooms have a combination of shower, bathtub, toilet, and sink. Most rooms have electrical outlets. And bathrooms have a water supply and drain. These similarities allow for the creation of a detailed set of building codes that describe most every aspect of the construction of a typical house. Building codes ensure that load-bearing walls can support the load. They ensure that the gauge of the electrical wire can support the current it must carry, and that the circuit-breakers trip if the current goes to high. Building codes ensure that in areas with high expected snowfalls, that the pitch of the roof is high enough, and that the trusses are strong enough to support the weight of the snow. And therefore, it stands to reason that the specifications for our house have been worked out pretty well. What's more, in most cases, different house styles are effectively following the same set of patterns in construction.

The fundamental difference between developing a software system and building a house is that when building most houses we have enough experience to be able to anticipate what needs to get done and in what order. There is virtually no need for experimentation. The same parts have been built many times. There are specialists for specific tasks. On the other hand, for most software development there are many unknowns—both at the outset and ones that emerge during the endeavor. Experimentation is usually required. There are complex interactions between various parts of the system that only become apparent during the endeavor.

There are fundamental differences between building a house and developing a software system. There are also some real similarities if we align them correctly.

Conclusion

The development of any non-trivial software system is a creative endeavor. The steps towards the solution, or even the ultimate solution, may not be completely known at the outset of the endeavor. In fact, during the development of the system, issues that had not even been conceived of at the outset, may arise and need to be solved.

When developing a non-trivial software system:

  1. We must accept that we can't always know the solution in its entirety before we begin the endeavor.
  2. And therefore, we cannot plan the entire endeavor at the outset.
  3. And therefore, we must allow the planning to be fluid as new issues arise and must be solved.

The above statements spell out fundamental differences between the idea that we can plan out the development at the outset of the endeavor and expect to adhere to that plan. Furthermore, when projects go "off the rails" it is not necessarily a short-coming of the team, but rather a stubborn adherence to the notion that a plan can be created and executed without a real understanding of what truly needs to be done. And sadly, these adherents express a naive surprise when the actual work doesn't follow that plan. We must keep in mind that as the creative endeavor progresses, the solution to one problem may lead to the need to solve another, that requires planning which could not have been done at the outset.

No comments:

Post a Comment