A practical approach to the Rebuild Problem

The most interesting part of a conference is often not the exhibit hall, the lectures, the keynote address, but the hallway conversations. Last week, I attended FROSSTcon, and had a fascinating hallway conversation with QA lead and build manager Ryan Maki. We ended up talking about the problems surrounding rebuilds, and how to make them as accurate as possible. Interesting enough, rebuilds have long been a head scratcher for me, simply because achieving an accurate rebuild is near impossible. And the work-to-value ratio is too low. The amount of work needed to achieve pretty accurate rebuilds is far outweighed by the inherent inconsistency that plagues rebuilding old code.

Consider what it takes to rebuild a six-month-old project and end up with the same binary as the original: You need to ensure that every build input (source code, environment variables, settings, etc.) and tool (compilers, artifact repositories, scripts, and even hardware) is exactly the same as it was six months ago, and ensure the process is executed exactly the same way. That's a steep hill to climb.

Which is exactly what I challenged Ryan with when he told me that rebuilds were critical for them. He stood firm, insisting that their rebuilds were fine. I objected, and said that ensuring an accurate rebuild is a storage nightmare because a virtual machine snapshot for any arbitrary build is necessary to lock in the tools, hardware, variables, compilers, etc. It just seems too much trouble with to little reward.

I was then instructed on an almost reasonable way to go about rebuilds. Ryan outlined an interesting  process for capturing and maintaining rebuilds:

  1. Establish a clean virtual machine image that acts as the standard build machine blank slate.
  2. Store all the build tools in a versioned repository.
  3. Build the build machine. On a schedule (nightly or weekly is reasonable, according to Ryan) throw away the build machine image. Reload the blank slate image and run the build tool installation script that installs known versions of all of the tools the same way every time.
  4. Continue building this reproducible build platform.
Now this approach makes a lot of sense, and has helped to almost change my opinion about rebuilds. As it turns out, when the "rebuilt build machine" builds, the rebuild should get the same build environment. This approach gets pretty close to a truly reproducible build because changes to the build environment are regularly wiped clean, and the process starts out as it was on day one. The result is that the accuracy of rebuilds is greatly improved.

Keeping the build process consistent still needs to be addressed. This is where the build management server comes in. For example, AnthillPro's archive/unarchive functionality does a pretty good job of helping with a rebuild. When a build is archived, the built artifacts are discarded but the build meta-data is kept. When a build is unarchived, AnthillPro can then run the build process with the same parameters as was used originally.

A few holes remain. In theory, someone could tamper with the build machine between the time it was generated and when a particular build is executed on it, resulting in inaccurate rebuilds (the only way around this is the headache of setting up security protocols). Additionally, the build itself may set environment variables, etc. that result in inconsistencies when builds are performed on a freshly reproduced build server. We could produce the build server for each build, but that be difficult to combine with CI. Finally, it's very difficult to control for a machine just flipping a bit somewhere or time based behaviors

Though Ryan's approach is compelling, I still believe that you are better off carefully tracking the built software with what ITIL calls a "Definitive Software Library". These libraries, including AnthillPro's Codestation, provide clear binary traceability in a relatively painless fashion. Essentially, if you care about a build, don't throw it away. Hold onto it and track it. But if that just isn't practical, or regulatory requirements demand precision rebuilds, try out Ryan's process -- and let me know. In the end, you can do a lot worse than a consistent build process on top of reproducible virtual hardware that was presented as part of a hallway conversation.


Re: A practical approach to the Rebuild Problem

Interesting! We were just discussing this very issue yesterday in our AnthillPro design meeting. It is definitely a complex problem.

Re: A practical approach to the Rebuild Problem

If you don't have a firm regulatory requirement to do rebuilds, I think it's just not worth the trouble. Back up your codestation repository and make sure your cleanup policies are reasonable.

Why rebuild if you still have the original build?

Now, rebuilds done as part of the lifecycle to change debugging parameters or something like that, are best done as secondary processes. I believe we have an earlier blog entry on that.

© 2006-2007 Urbancode, Inc.
Anthill, AnthillPro, and AnthillOS are trademarks of Urbancode, Inc.
All other trademarks are owned by their respective owners.
tel: (216) 858-9000 fax: (216) 858-6902 email:info@urbancode.com