|
|
![]() Continuous Integration: Was Fowler Wrong?It's about tests not builds
While rereading Martin Fowler's classic paper, Continuous Integration, it struck me that its approach to Continuous Integration (CI) is fundamentally flawed. Fowler, like most of the CI community, seems to argue that CI is about building rather than testing. This basic misconception, permeating an otherwise good paper, has contributed to poor tool designs that are focused on build automation and, perhaps more importantly, an untold numbers of teams following bad practices. The problem is clear in Fowler's definition of CI (emphasis added):
Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. At first, the definition reads as a perfectly reasonable statement. Indeed, constant integration improves team communication and decreases integration expense. A build with tests does help with early detection quality degradation, when defects are cheaper to fix. The problem, however, is that with the exception of compilation problems, builds do not detect integration errors, tests do. In the paper, CI is fundamentally about integrating code commits and frequent builds that verify those commits. This model may work well for trivial projects. Because all the tests can be run from within a build in the trivial case, we see the build as validating the integration. However, when the tests needed to adequately verify the integration no longer fit in a fast build, the illusion that builds validate integrations dissipates. The focus on builds, which is reinforced by Fowler’s paper, subtly corrupts a practice that should be founded on good, fast testing. Tests need to take center stage, and the build needs to be considered just a simple test of compilation. Where Build Focused CI Practices Fail To be truly successful in CI, Fowler asserts that the build should be self-testing and that these tests include both unit and end-to-end testing. At the same time, the build should be very fast -- ideally less than ten minutes -- because it should run on every commit. If there are a significant number of end-to-end tests, executing them at build time while keeping the whole process under ten minutes is unrealistic. Add in the demand for a build on every commit, and the requirements start to feel improbable. The options are either slower feedback or the removal of some tests. In the 2006 rewrite, Fowler addresses the problems of long-running tests by suggesting a "staged build." In a staged build, fast tests are run in "commit builds" and slower tests are run as part of "secondary builds." Commit builds would then provide quick feedback on the most important issues while the secondary builds execute additional tests to detect less obvious integration errors. This is certainly an improvement over the difficult situation previously discussed, but in practice the many types of build required by this approach are problematic. How does one ensure that the sources a secondary build uses match those that first passed a commit build? How does one relate the results of various slow builds to each other and the results of the commit build? These types of problems tend to be difficult to solve, often requiring a good deal of cleverness, the use of excessive source control labeling\tagging, and at best remain only partly solved. Further, in order to practice CI speed is of the essence; however, the staged build approach reruns the same compilation several times when running different tests with each build. The extra building wastes resources that could be running tests. Other complications arise when CI is scaled beyond the trivial project. In an enterprise environment, many tests that detect integration problems do not reside in source control, where build scripts can easily launch them. The tests may live in enterprise testing tools such as Borland's Silk Central or HP's Quality Center. A QA team may have testers devoted to testing recent builds and ensuring that the new functionality actually works. It’s difficult to have build scripts run these outside systems, and manual testing simply does not fit into the scope of an automated build script. Added to this, the obsession with making everything a build hurts traceability, limiting what can be done and wasting time. Unfortunately, a number of tools are built around this principal of automating builds of various types. As one would expect from the CI community, these automated build systems typically -- and inaccurately -- call themselves "Continuous Integration Servers." A focus on build restricts these tools to only providing proper CI support for trivial projects. Fowler's paper likely contributed to this unfortunate state of affairs. A Better Way By discarding build as a focus, what remains are integrations and tests of those integrations. In practice, developers continue to integrate many times a day, and tests are run to see if errors were introduced during those many integrations. Each set of tests is run as often as there is something new to test and resources are available. The first and most fundamental test is the compile test. On every commit, a process gets the source code and compiles it. By adding the execution of some additional fast tests, this process suddenly looks very much like Fowler's "commit build." It's the last build necessary though, and so for our purposes it will just be "the build." When the build is done, the team is notified of its status and of any critical problems that need to be addressed. The various other slower tests still need to happen. Instead of several "secondary builds", the other processes are simply functional tests, stress tests and deployments to manual testing environments. Tests are not builds; deployments to QA are not builds; and neither should be called builds. In order to perform these various tests, built software is still required. Fortunately, the build creates the software we want to test every time there is a commit. If there is an hour-long functional test process, it merely needs to be able to pick up the most recently built software that has passed fast tests, move it to the functional testing environment, and run the longer tests on the artifacts. That can happen every hour while our fast build happens several times an hour. If the results of those tests -- and any other tests run against this one build -- are collected together one can get an increasingly complete view of the quality of the software, and can be ready to correct faults caused by new code. To do this well, the system needs to be able to reach beyond the confines of tests stored in source control and run tests stored in enterprise QA systems. As this understanding of CI permeates the community, there should be an increasing number of tools that provide facilities to run secondary test processes against applications built earlier in the day or week. And this is happening right now. My employer, Urbancode, has used the test approach to CI in its CI Server for some time now, and other tools have recently started to adopt this strategy as well. Teams using this new breed of CI Server no longer need to juggle many builds. The freedom to act on existing builds allows the CI server to be used as the basis for a release management system that helps move the software out of the testing environments and into production. All of this though, is only possible by shifting the focus of CI theory and tooling from build to test. As such, a revised definition would be: "Continuous Integration is a software development practice where members of a team integrate their work frequently. Integrations are verified by tests (including build) to detect integration errors as quickly as possible." - Eric Minick, Urbancode Re: Continuous Integration: Was Fowler Wrong?
Eric,
I don't get your CI criticism. If you build as quickly and as often as possible, how can you tell in what build things got wrong? For me, one of the main advantages of CI is that you know exactly what test in what build went wrong, and that you are able to pinpoint the error(s) to the (hopefully) small amount of code you just checked in.
I cannot find the answer to this in your vision.
And, as far as I'm concerned, I don't see many differences between your approach and Martin Fowler's - but maybe that's just me :)
CI builds - Haste Makes Waste
Continuous integration (because of the time constraint) is practically limited to incremental builds. Unfortunately, because incremental build techniques are practically based on comparing file timestamps, they cannot guarantee that all the affected components will be rebuilt. What this means is that a successful incremental build can't even guarantee that the code set of the working revision is actually even compilable, let alone runnable. (this is why CI servers need to schedule clean builds every so often)
So a CI build is useless for any sort of extensive testing. And it absolutely must not be used for a build meant for release. All it's really good for is to give the developers a warm fuzzy feeling that their checked in code didn't miss anything really obvious. You could also attempt to test it, but as you say most real test suites take a fairly long time. The only really useful type of "integration testing" build is the (primarily nightly) clean build. You can pretty much guarantee that as long as your build environment is the same (or remains relatively compatible) you should be able to build exactly the same output from the same revision. And if done overnight, there is also enough time to run a more complete regression test than would be possible for a CI build. Plus - if the build requires any sort of manual testing, the test team cannot run at the speed of the CI build output. Getting a clean build more than once a day is usually more than they can handle. So why do people use CI builds so fervently? My observation is that it is due to bad project management practice. Generally, development schedules are so messed up that every customer is hollering for output yesterday. This means development managers are generally eager to just GIOTD (get it out the door) ASAP, and to do so, most will cut every corner imaginable to reduce the wait time by even a few hours. A CI build server will produce an incremental build a lot sooner than a clean build system. For a manager under pressure from the customers, it's like a liferaft to a drowning man. Unfortunately under this kind of pressure, it also generally means that a given build in hot demand will not only not be cleanly built, but that the build might go out with the barest of testing - sometimes just a quick positive test on a few of the more critical fixes in the release. Of course, if a customer receives bugfixes more quickly by hollering, they will remember that next time and repeat the practice. Correspondingly, a manager will resort to a "firefighting" approach in response, and will also get accustomed to working under this sort of pressure, and in removing all the safety processes - all the time. The end result is buggier software, burned out developers and a dysfunctional relationship with the customers. Not the result one would have expected from the extant hype about CI servers. So at best, CI builds are only useful to the development team - as a sort of ongoing sanity check - never to be exposed to any sort of project or product management. They are not proper builds. And they provide only the barest level of testing. At worst, CI builds can "feed the fire" - stroke management's ego to deliver slightly more quickly than reasonable. And encourage developers to be lazy - why review your code if the CI build will catch it? My experience is that by avoiding CI builds and sticking only to clean builds, it encourages better developer practices (such as re-eyeballing the code before committing it), and avoids giving managers the option to jump the queue. The pace of development (and management expectations for such) is slowed down to a more reasonable level - and code quality (and customer satisfaction) goes up. And unless your developers have poor short term memory, reacting to a build report the next day is usually good enough. CI servers do contribute toward keeping the current code set buildable. If a developer commits something that breaks the build - a CI build can help minimize the amount of time that other developers are affected by it. But again, with good developer practice, commits should almost always leave the code set buildable. So the question is - do you change the behavior of the developer (or possibly just change the developer) - or do you accomodate the build system to the developer? A tough call - sometimes you have to work around the people you have. But the embarrassment pain of a nightly build break (or of inconveniencing the team) is a great feedback tool. And a CI build server minimizes the strength of that signal. CI builds - Haste Makes Waste
Kunida,
I'm very sympathetic. My role at Urbancode is that of a consultant, so I see a lot of build practices. Needing to seperate the "official" clean build, from the "fast" CI build is something I see. I'm pleasantly surprised though, how often we can do a clean build and still get the build done within five or ten minutes. Sometimes we are just working on small projects, other times its larger projects with limited amounts of time consuming code generation and modern fast building languages. A key to what I view as "clean CI builds" seems to be modular systems where a change to a component only requires that component to be built for basic compile/unit testing, and perhaps another automated (and fast) build of a top level project that assembles a small amount of code and the many components into software that can be tested end to end. The component model is somewhat less "continuous" on the integration side though. While I integrate with my team members on a particular component constantly, the build (and occassional QA approval) sit as gates between my change and integration to the top level project. On larger projects though, this sort of gated CI seems to be a reasonable way to split the difference between the goal of constant integration and the tendency for the cost of a broken build (and its frequency) to grow as the number of developers on the team increases. I've seen this done effectively both using component models as well as clever uses of source control streams that bring rapid integration within the immediate team and slower integration out to the broader team. Regardless, to whatever degree possible, if I can avoid build types, I prefer to. If my CI build is built exactly the same way as my official build, that's a good thing and worth a time penalty as long as it keeps CI plausible. If it just can't be done, two build loops - one for clean builds and another for incremental builds - can be set up. We'd like to keep our functional test processes filled with clean builds, but if we can't hopefully our source control tools and build processes keep our incremental builds good enough that while we wouldn't release them to production, automated functional testing on them is still worthwhile. CI builds - Haste Makes Waste
Having invested the time to ensure my incremental build do in fact build everything I humbly disagree with this assertion. For many years I felt that only a 'clean build' could be delivered but this was a reaction to having felt pain from a incremental build. Modern tools like SVN render this problem moot imo.
Re: Continuous Integration: Was Fowler Wrong?
Hi there, Eric:
"The focus on builds, which is reinforced by Fowler’s paper, subtly corrupts a practice that should be founded on good, fast testing." This corruption is similar in nature to how various other XP advices such as "doing the simplest thing possible", "ask the customer", "no upfront design", etc are mis interpreted by those who do not apply their brains and instead become story card focussed and narrow minded. The "Better Way" as you have put it, is indeed how we at TW Bangalore interpret and apply Martin's advice. It's foolish to keep compiling the same code again, of course, and we here definitely don't indulge in such foolishness. We indeed have a variety of tests, and tests indeed are our focus. If anyone derives joy from repeated compiles, that person definitely needs this message that you are giving in your blog post because he's then checking hard disk performance and power consumption, and not whether the software is building correctly. The way this "better way" comes into being is by simply having observing over a three day period that everyone is waiting for the build to go green and that it's a really long wait! Lots of things are done to ensure that compiles are very fast, and that the quality and quantity of tests are necessary and sufficient - tests too have to be maintained after all. So, the practice of CI teaches us all a lot of things - customers will want functioning code, and waiting for code to be built repeatedly does not give them that sooner than later. -- Ram Re: Continuous Integration: Was Fowler Wrong?
Ram,
Glad to hear that's how you guys are doing things. A careful look at Fowler's staged builds leaves open the possibility that what I suggested is exactly what he wants people to do. It's painfully ambiguous and I see teams going with the constant rebuilding approach so often that I took that meaning since it's the one that needs to be confronted publicly. Re: Continuous Integration: Was Fowler Wrong?
Interesting perspective. However, I have to disagree. Although tests are a pivotal piece of CI, it is not the key piece. Constant feedback, IMHO, is the key piece. That feedback comes from a barrage of sources. The results of testing is one source. However, a good CI system will also do a deployment to a test environment and expose it to QA which provides another source of feedback. Yet another source would be your business owners taking a look at the functionality during development and providing feedback and course correction. It is this feedback that is the key point in CI and any Agile process.
Re: Continuous Integration: Was Fowler Wrong?
Michael,
With all do respect, I think we completely agree. I was quite intentional in dropping "automated" from the description of tests and while I think automated tests are important, the verification of the product should be done by unit, functional and manual testing. While it's not normally thought of this way, I would tend to include the business user being hands on with a recent build to be useability testing and a check of requirements. I fell into the same trap as the paper I was critical of by not being specific enough. Thanks, Eric Re: Continuous Integration: Was Fowler Wrong?
Interesting to think that compilation isn't considered as testing. If it's not a test then how can it fail? The fact that a CI build stops on compilation failure says more about the language than the approach. IMHO CI is about, funnily enough, integration. That involves, compilation (if necessary), assembly (if necessary), deployment (if necessary), and "formal" tests.
Re: Continuous Integration: Was Fowler Wrong?
Simon,
I would consider compilation to be the very first test run on newly integrated code. The integration is absolutely what we are trying to encourage, but we need to supply tests to detect errors quickly - since they are now exposed more quickly. But yes, the argument is essentially that "build is just a test" rather than "test is part of build". Giving test (formal or informal; automated or manual) primacy is my goal. -- Eric Re: Continuous Integration: Was Fowler Wrong?
Eric,
Good stuff. I wont get pedantic and argue the semantics as several of the comments that seem to disagree, imo seem to agree with you position as well. As you know we are on of those enterprise class build shops. Urbancode's idea of a living build is the backbone of our operation. I agree that the process is about testing but I see it mostly in the release process as well as the build process. What AHP3 does for us is help to integrate those two different phase. CI does allow me to add lots of quality gates in the build process and give me more reasons to fail a build sooner as opposed to later. We don't consider Dev deploys s part of the release process but the final test of the build process where we can do automated functional testing prior to 'releasing' to QA. There may still be more testing as the product moves down the convyor belt that is the release process. Things like performance testing and UAT come to mind though we can do some performance testing way back in Dev too. So for us I wouldn't say it is about any one thing but I might distill it to timely feedback (the sooner the better), transparency of that feedback and perhaps, ultimately the most important thing to us is traceability so we can look and show that we did what we say we do. |