Should we promote source code or built binaries?

One of the interesting tensions we see is in how various teams promote approved or tested changes towards production and release. In an ideal world, our developers would work on various bug fixes and new features and as we neared release, a release manager would select the features she wanted and the bug fixes that were tested, create a build based on that source code baseline and release it. Many source control vendors (especially those with more expensive products) push this approach. 

Unfortunately, the changes made for one item on the list can impact other change items in unpredictable ways. A common alternative is to have everyone work on a shared branch. Indeed this is a standard CI practice. When a build is completed, the produced artifacts are tested in various environments, gain various approvals and some builds are eventually released out into production.

This build promotion model is great, in that it gives us the more realistic tests and addresses the problem of unanticipated impact seen in the source code promotion model.  However, we've lost the ability to pick and choose exactly what changes we want to include in our release after the work has been done - we can only make that decision before the work for a release is done.

The trend seems to be towards the build promotion model, and that's one that fits quite neatly in AnthillPro's model. The most common other approach that we are seeing is a two stream model. The first stream takes all developer changes and does CI builds against those. These builds off the development stream tend to only be sent to early testing environments. The second stream, receives changes promoted by a release manager from the first. These promotions kick off another build that goes through all the testing environments - retesting the changes to make sure they still work when assembled in this combination.

This tension between being able to pick and choose features and fixes and actually testing all the changes working together is an interesting one. Neither the classic, "Promote this source change to Production state" approach pushed by many SCM vendors nor "do everything in one stream" approach present in classic CI provides us the control, flexibility and testing that are needed in some environments. Given the choice, I'd lean towards doing everything in a single branch or stream, but I think the hybrid approaches are gaining some traction.

Continuous Integration: Was Fowler Wrong?

It's about tests not builds

While rereading Martin Fowler's classic paper, Continuous Integration, it struck me that its approach to Continuous Integration (CI) is fundamentally flawed. Fowler, like most of the CI community, seems to argue that CI is about building rather than testing. This basic misconception, permeating an otherwise good paper, has contributed to poor tool designs that are focused on build automation and, perhaps more importantly, an untold numbers of teams following bad practices. The problem is clear in Fowler's definition of CI (emphasis added):

Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible.

At first, the definition reads as a perfectly reasonable statement. Indeed, constant integration improves team communication and decreases integration expense. A build with tests does help with early detection quality degradation, when defects are cheaper to fix. The problem, however, is that with the exception of compilation problems, builds do not detect integration errors, tests do. In the paper, CI is fundamentally about integrating code commits and frequent builds that verify those commits. This model may work well for trivial projects. Because all the tests can be run from within a build in the trivial case, we see the build as validating the integration. However, when the tests needed to adequately verify the integration no longer fit in a fast build, the illusion that builds validate integrations dissipates.

The focus on builds, which is reinforced by Fowler’s paper, subtly corrupts a practice that should be founded on good, fast testing. Tests need to take center stage, and the build needs to be considered just a simple test of compilation.

Where Build Focused CI Practices Fail
To be truly successful in CI, Fowler asserts that the build should be self-testing and that these tests include both unit and end-to-end testing. At the same time, the build should be very fast -- ideally less than ten minutes -- because it should run on every commit. If there are a significant number of end-to-end tests, executing them at build time while keeping the whole process under ten minutes is unrealistic. Add in the demand for a build on every commit, and the requirements start to feel improbable. The options are either slower feedback or the removal of some tests.

In the 2006 rewrite, Fowler addresses the problems of long-running tests by suggesting a "staged build." In a staged build, fast tests are run in "commit builds" and slower tests are run as part of "secondary builds." Commit builds would then provide quick feedback on the most important issues while the secondary builds execute additional tests to detect less obvious integration errors. This is certainly an improvement over the difficult situation previously discussed, but in practice the many types of build required by this approach are problematic.

How does one ensure that the sources a secondary build uses match those that first passed a commit build? How does one relate the results of various slow builds to each other and the results of the commit build? These types of problems tend to be difficult to solve, often requiring a good deal of cleverness, the use of excessive source control labeling\tagging, and at best remain only partly solved. Further, in order to practice CI speed is of the essence; however, the staged build approach reruns the same compilation several times when running different tests with each build. The extra building wastes resources that could be running tests.

Other complications arise when CI is scaled beyond the trivial project. In an enterprise environment, many tests that detect integration problems do not reside in source control, where build scripts can easily launch them. The tests may live in enterprise testing tools such as Borland's Silk Central or HP's Quality Center. A QA team may have testers devoted to testing recent builds and ensuring that the new functionality actually works. It’s difficult to have build scripts run these outside systems, and manual testing simply does not fit into the scope of an automated build script.

Added to this, the obsession with making everything a build hurts traceability, limiting what can be done and wasting time. Unfortunately, a number of tools are built around this principal of automating builds of various types. As one would expect from the CI community, these automated build systems typically -- and inaccurately -- call themselves "Continuous Integration Servers." A focus on build restricts these tools to only providing proper CI support for trivial projects. Fowler's paper likely contributed to this unfortunate state of affairs.

A Better Way

By discarding build as a focus, what remains are integrations and tests of those integrations. In practice, developers continue to integrate many times a day, and tests are run to see if errors were introduced during those many integrations. Each set of tests is run as often as there is something new to test and resources are available.

The first and most fundamental test is the compile test. On every commit, a process gets the source code and compiles it. By adding the execution of some additional fast tests, this process suddenly looks very much like Fowler's "commit build." It's the last build necessary though, and so for our purposes it will just be "the build." When the build is done, the team is notified of its status and of any critical problems that need to be addressed.

The various other slower tests still need to happen. Instead of several "secondary builds", the other processes are simply functional tests, stress tests and deployments to manual testing environments. Tests are not builds; deployments to QA are not builds; and neither should be called builds.

In order to perform these various tests, built software is still required. Fortunately, the build creates the software we want to test every time there is a commit. If there is an hour-long functional test process, it merely needs to be able to pick up the most recently built software that has passed fast tests, move it to the functional testing environment, and run the longer tests on the artifacts. That can happen every hour while our fast build happens several times an hour.

If the results of those tests -- and any other tests run against this one build -- are collected together one can get an increasingly complete view of the quality of the software, and can be ready to correct faults caused by new code. To do this well, the system needs to be able to reach beyond the confines of tests stored in source control and run tests stored in enterprise QA systems.

As this understanding of CI permeates the community, there should be an increasing number of tools that provide facilities to run secondary test processes against applications built earlier in the day or week. And this is happening right now. My employer, Urbancode, has used the test approach to CI in its CI Server for some time now, and other tools have recently started to adopt this strategy as well. Teams using this new breed of CI Server no longer need to juggle many builds. The freedom to act on existing builds allows the CI server to be used as the basis for a release management system that helps move the software out of the testing environments and into production.

All of this though, is only possible by shifting the focus of CI theory and tooling from build to test. As such, a revised definition would be:

"Continuous Integration is a software development practice where members of a team integrate their work frequently. Integrations are verified by tests (including build) to detect integration errors as quickly as possible."

- Eric Minick, Urbancode

© 2006-2010 Urbancode, Inc.
Anthill, AnthillPro, and AnthillOS are trademarks of Urbancode, Inc.
All other trademarks are owned by their respective owners.
tel: (216) 858-9000 fax: i (216) 393-0006 email:info@urbancode.com