Part 2 of Sloppy Deployments

In my previous post, I wrote about the evils of sloppy deployments. Two main practices may lead to what I call "sloppy deployments," they are:

  • Allowing changes to be made directly to the deployed application and thus bypassing the build system,
  • Performing incremental deployments manually without a verification step.

The problem with these types of practices is that they eventually lead to a deployed state that is different from the state of the application as it was built. And that presents problems. The symptoms can range from two (what were intended to be) identical installations behaving differently, to the application simply not working for some seemingly unexplained reason.

Last month we looked at possible strategies to prevent such sloppy deployments. The first strategy is to lock down the deployment environment so that users can not access it and therefore can not make changes directly to the deployed artifacts. We pointed out that locking down the deployment environment may still not guarantee deployment integrity if manual incremental deployments are used. Another strategy we looked at was to make use of coarse grained deployment artifacts. Here we were trying to use the human psyche to our advantage. The reasoning was that it is much easier to make a change on the development machine and rebuild and redeploy one artifact than to explode the artifact, make the change, and then re-archive the artifact on the deployment environment.

In this installment, we're going to look at discovery of invalid deployments and their cure. If we're dealing with an application that contains many (perhaps hundreds or thousands) of files, then we'd have to check whether each and every file is exactly the same as what was originally deployed. Sounds like a lot of work, if you're going to do it manually.

RPM and Package Verification

Tools like RPM on Linux have a built in ability to verify that the files on the system are exactly the same as what was originally deployed/installed. Every time a package is installed using RPM, each file installed by the package is logged in the RPM database along with a few attributes that describe the file. This database of packages and their files allows RPM to later verify that an installation of a package has not been tampered with. This verification step is performed by comparing the attributes of each file (as it is on disk) in the package with the attributes for the same file as they exist in the RPM database (which is also how the file attributes existed at the time of the installation of the package). The following file attributes are compared by RPM during package verification:

  • file owner
  • file group
  • file mode (permissions)
  • MD5 checksum
  • file size
  • modification time
If there is a verification failure, the path of the offending file(s) as well as the code(s) indicating the attribute(s) that filed verification are printed to the standard out.

The mechanism built into RPM is pretty much the ideal for discovering hijacked deployments. Curing them is as easy as reinstalling the same package. But RPM is a Linux centric solution, and even in the Linux world it is not very common for people to create RPMs of their own web applications. The fact is that it's a heavier weight solution than what we're looking for.

Message Digests and One-Way Hash Functions

There is another way to accomplish pretty much the same thing. Before going too much into it though, lets take a closer look at MD5 and hashing algorithms in general.

Message digests or one-way hash functions (as they are also termed) are well known creatures in cryptography. The general idea behind these beasts is that given an input of arbitrary length (say a file) the hash function will convert it to a fixed-length output (i.e. the hash value). The key here is that one-way hash functions make it easy to compute the hash value given an input, but make it very difficult (almost impossible) to compute a potential input given an output. In a good one-way hash function, the change of a single bit in the input will produce a change in about half of the output bits.

We can use such one-way hash functions to verify that the files of our application have not been tampered with. If we know the original hash value of each file and compare it to the hash value of each file now, any modified files will have hash values that are different from the original ones.

File Checksum Integrity Verifier and md5deep

Luckily for us, there are a few tools out there that will do this for us. In the Windows world, there is the File Checksum Integrity Verifier (FCIV) tool from Microsoft. This tool can be run on a directory tree to calculate the hash value of every file in the directory tree and store the results in an XML file. The same tool can then be run in a verification mode where it verifies that all the files in a given directory tree have the same hash values as are recorded in a specified XML file.

In the Open Source and cross-platform world, there is md5deep. This tool does pretty much the same thing as the FCIV tool above. It records the hash value of every file in a directory tree in one mode and verifies that the hash value matches the value recorded in the verification mode. Pretty much the only difference is that md5deep stores the hash values and file paths in a flat text file with one line per file rather than the XML format used by the Microsoft tool.

Using Message Digests in Your Deployment Process

Given the above discussion, it should be easy to integrate deployment verification into every build and deployment process. One of the build artifacts produced by every build should be a database containing the name of every file to be deployed along with its message digest (hash value). This database would be stored in a flat text file or an XML file, depending on the tool used. After deployment, the installation should be verified against the digest database. And perhaps most importantly, before every promotion event, the installation should again be verified. This means that before the code gets promoted from QA to production, the QA installation gets verified, just to make sure no one took any shortcuts.

There certainly are many other occasions where it may be appropriate to make use of such verification practices. When faced with seemingly identical installations that display different behavior, one of the preliminary tests to be run may be the verification of each installation.

We did not get a chance to talk about incremental deployments this month as promised Lets save that topic for an entire column in one of the months ahead.




© 2006-2007 Urbancode, Inc.
Anthill, AnthillPro, and AnthillOS are trademarks of Urbancode, Inc.
All other trademarks are owned by their respective owners.
tel: (216) 858-9000 fax: (216) 858-6902 email:info@urbancode.com