RSS RSS feed | Atom Atom feed

Taking Continuous Integration to the next level

OK, the guys at work have been bugging me to write up some longer, more technical blog posts, so here goes. We have set up a new system for doing continuous integration on multiple projects with dependencies on each other, and I’ll describe what the problems were and how we solved them.

When doing Continuous Integration, the tools have advanced to point where doing a single project is pretty straightforward. We've been using nAnt, CruiseControl.net, and subversion for several months and I am very happy with the results. But we have started moving beyond doing individual disconnected projects. We started doing CI at the company with two projects. One was dependent on the other, but we had been doing the integration between them manually. Project A would build under CC.net and generate a zip file with build artifacts - all the DLLs, config files, etc. that were needed by project B. But it was up to the developers on project B to grab that zip file and unzip it into their project, where they checked in those binary artifacts.

This worked fine - up to a point. But there are a couple of issues. The first is that any manual process like this screams out for automation - to make sure it always gets done correctly, and to catch problems early. The second problem with this approach is that it just doesn't scale well when you start talking about more complex project dependencies. Even with just two projects, we started running into things like both project A and project B having common dependencies, including things like log4net, nAnt, and nUnit, as well as some common pieces that came from within the company. If those shared dependencies got updated by one project, it often lead to mis-matches in the other project. Log4net was especially thorny because it is a strongly signed assembly, and the two versions we were using had API changes that made them incompatible.

So we embarked upon a project to move dependencies that are shared between projects into a lower level shared project. There are several ways to accomplish this sort of thing. We’ve come up with a system using subversion that I think is unique, so I will describe that, then briefly describe doing something similar using the filesystem.

I think I first heard the term Enterprise Continuous Integration from Bill Caputo. Mike Roberts has also written an excellent paper on the topic, and there is a zip file with example code and ccnet config files at the cc.net website. Mike’s description of using the filesystem as a repository is similar to what I describe below – he goes into more detail than I will though.

Using subversion to manage dependencies

All of our projects are currently version controlled using subversion, so we started by setting up a new subversion repository that is dedicated to build artifacts. Subversion does a really good job of binary diffs, allowing you to store many different revisions of binary files without growing excessively large too quickly, but I will be monitoring its growth and performance over time. I segmented the repository into internal and external projects, and then I have a directory for each project any other project might depend on. We started with our build tools – nAnt, nUnit, etc., and then went on to add the artifacts of each project as we went along.

The next step was to get each project using those dependencies. Subversion has a feature called svn:externals, which is a special subversion property that can be applied to directories. We standardized on a common top level directory layout for each project – all dependencies go into a folder named bin under the project root, and then into a directory by project name. Here’s the output from running the command line svn propget tool in one project’s bin folder:

C:\work\guide\bin>svn propget svn:externals
DeSade svn://dccode01/dependencies/trunk/datacert/sharedoc/DeSade
Fit svn://dccode01/dependencies/trunk/external/Fit
Interop svn://dccode01/dependencies/trunk/external/Microsoft
Marquis svn://dccode01/dependencies/trunk/datacert/sharedoc/Marquis
nant svn://dccode01/dependencies/trunk/external/nant
nCover svn://dccode01/dependencies/trunk/external/nCover
ndepend svn://dccode01/dependencies/trunk/external/ndepend
nDoc svn://dccode01/dependencies/trunk/external/nDoc
nhibernate svn://dccode01/dependencies/trunk/external/nhibernate
NUnit svn://dccode01/dependencies/trunk/external/NUnit
datacert svn://dccode01/dependencies/trunk/external/datacert
log4net svn://dccode01/dependencies/trunk/external/log4net
nmock svn://dccode01/dependencies/trunk/external/nmock
StructureMap svn://dccode01/dependencies/trunk/external/StructureMap
SideShow svn://dccode01/dependencies/trunk/datacert/sideshow
this svn://dccode01/dependencies/trunk/datacert/guide
thisUI svn://dccode01/dependencies/trunk/datacert/guideAdminUI

Note those last two entries - this and thisUI. Those are actually copies of the binaries from this project. What happens is that each project has a step named “archive” in the nAnt build. If the build succeeds (compiles, passes all unit tests and fitnesse regression tests) then the archive step copies all the build artifacts to the bin/this directory and then does an svn commit with a standard comment that includes the full version number, which includes the cruisecontrol label. At that point, any project that depends on that project can simply do an svn update and get the latest version of all the dependencies. This project actually has two separate sets of artifacts, so I have two folders – one for each set. Here is what the nAnt task looks like:

<target name="archive" depends="copyArtifacts,checkinArtifacts"/>
<target name="copyArtifacts" depends="init,version">
<echo message="copying from ${build.dir} to ${dependenciesdir.guide}"/>
<!-- if new files are added to the filesets, they will need to be svn add'ed to the dependencies project manually. Likewise for files removed. -->
<copy todir="${dependenciesdir.guide}" overwrite="true" >
<fileset basedir="${build.dir}">
<include name="DataCert.Guide.dll" />
<include name="DataCert.Guide.pdb" />
<include name="DataCert.Guide.ShareDoc.dll" />
<include name="DataCert.Guide.ShareDoc.pdb" />
<include name="Rules/**/*.*" />
</fileset>
</copy>

<copy todir="${dependenciesdir.guide}/testing" overwrite="true" >
<fileset basedir="${build.dir}">
<include name="DataCert.Guide.Fit.dll" />
<include name="DataCert.Guide.Fit.pdb" />
</fileset>
</copy>

<copy todir="${dependenciesdir.guideAdmin}" overwrite="true" >
<fileset basedir="${build.dir}">
<include name="StructureMap.dll" />
<include name="AxInterop.SHDocVw.dll" />
<include name="DataCert.Guide.dll" />
<include name="DataCert.Guide.ShareDoc.AdministrationUserInterface.dll" />
<include name="DataCert.Guide.ShareDoc.dll" />
<include name="Interop.SHDocVw.dll" />
<include name="log4net.dll" />
<include name="RuleSetAdministrator.exe" />
<include name="StructureMap.DataAccess.dll" />
</fileset>
</copy>
</target>

<target name="checkinArtifacts">
<exec program="svn.exe" workingdir="${dependenciesdir.guide}" commandline="ci --username CruiseControl.net --password SuperSecret -m&quot;automatic check-in, cc.net label ${ccnet.label}&quot;" />
<exec program="svn.exe" workingdir="${dependenciesdir.guideAdmin}" commandline="ci --username CruiseControl.net --password SuperSecret -m&quot;automatic check-in, cc.net label ${ccnet.label}&quot;" />
</target>

This has been working very well for the past month or so it has been in place. It took a couple of days to get all the existing projects using the new system, re-do the .csproj files to change where the references were, and resolve version conflicts, but now everyone is on the same page.

There are a couple of rough spots with this system. If a project starts publishing new artifacts, those have to manually be added to the dependencies project in subversion. Same thing if an artifact is no longer published – it would have to be manually removed. Another issue is that svn:externals can only deal with things on a directory-by-directory basis, so if project B only needs a single file from Project A, or if it needs to put files in a certain location in its directory structure, that has to be accomplished during the build with nAnt moves and/or copies.

Another issue with the current system is that it is somewhat difficult for a developer to work with two dependent projects, modifying the source of each at the same time. In our “Project B depends on Project A” example, if a developer needs to modify project A and build it, then get the artifacts of A into project B, that is either a manual step, or they check in and let the build complete, then work on project B. One way to remediate this would be to have a standard nAnt target in every build – something like ‘updateLocalDependencies’ – that would copy the build artifacts from the local machine rather than depending on a subversion update.

One thing we haven’t experimented with yet is doing branches or tags of dependencies, but this should be relatively straightforward. If project A branches and starts doing two builds – one for new development on the trunk, and one for a released version, it should likewise create a branch in the dependencies project for its artifacts, and make sure that the build sends those artifacts to that branch. Any project that depends on project A can then decide to pick up either the branch or the trunk version of project A. This same idea could also be applied to tags, so that the final released version of a project could be set up to get very specific tagged versions of its dependencies.

Right now, we don’t have anything set up to automate the process of cascading the builds, but it is something that could be done without too much extra effort. What I will probably do is just add additional <sourcecontrol> blocks to my ccnet config file, so that the build for project B not only watches for changes to its source files, but also any changes to its dependencies.

What would be really nice at some point would be a system that has a central manifest of projects and their dependencies, probably stored as XML. From that, you could generate your ccnet.config file using XSL. You could also generate a dependencies graph using AT&T’s GraphViz toolset. But that is a topic for yet another post.

Using a file-based repository

The second technique is to use a file-based repository of build artifacts. The basic idea is that every project publishes its artifacts into a well-known location, using some sort of standard to include meta-data about the artifacts. Project artifacts can cover a wide range of items - it could be jar files or DLLs, batch files/startup scripts, configuration files, even installable images like RPMs or MSI files. The metadata for the artifacts includes things like the version number, project name, perhaps a branch name or release status (alpha, beta, final, etc.). One extension to this idea is to have projects package these artifacts into a single package using zip, tar/gz, or something similar - this makes it more likely that the package will travel as a consistent set of files that are known to work together.

This is the setup we used at my previous employer. We had a pretty complex system - at the time I left the system was building over 15 different projects, with many interdependecies. Each project was typically doing development on more than one branch, and was built on multiple OS platforms. Most projects had a mix of C and/or C++ code, plus Java code. All used Ant and CruiseControl to build, and I was lucky enough to work with another really sharp developer who wrote a set of custom Ant tasks to handle creating versioned zip files with a consistent internal structure, publishing those to the central file server, retrieving those from the central file server (but only if a newer version was available), listing dependencies, and managing dependencies between projects from a central configuration file.

To make this more concrete, here is the basic system we used. Names have been changed to protect the innocent and shield the guilty from blame.

Each project had similar Ant builds, with similar targets. Each build had a 'cruise' target that did the compile and ran the unit tests. If all the tests passed, the cruise target would then create an archive zip file with a consistent directory structure. These then got copied to a central file server that used directory names for meta data. For example, you might get to a projects latest artifacts by going to

\\filer\builds\infrastructure\trunk\win32\latest\infra_0405.zip

If you wanted a build from a branch, you might find it at

\\filer\builds\infrastructure\branch_rel3\win32\latest\infra_0302.zip

We also had directories for released versions of the code, like this:

\\filer\builds\infastructure\released\release3\win32\infra_0604.zip

Summary

Using subversion as the dependency repository works quite well for us – but right now we are a small team with lots of control over our environment. Having a larger team, or not being able to use subversion everywhere, or having a mixed Windows/Linux environment might lead you to make different choices. They key thing is to do something to make your life easier. Always remember Larry Walls’ quote: “There are three attributes of a good programmer – laziness, impatience, and hubris.” Feed your laziness and automate yourself out of the boring parts of your job.

 

Further reading

Apache Gump – gump is a project used by the Apache Jakarta sub-projects to do Continuous Integration at an amazing level. Every night, it gets the latest source code for (at this time) 779 open source projects. It then builds them all, using the results of lower-level projects to build higher level projects. It generates email to interested parties to let people know, for example, if a change they made broke a project that depended on them, enabling early warnings of potential problems. See the latest nightly status to get an idea of the scope of this project. Java-centric.

Maven – Maven is a build tool, similar to ant or nAnt, but builds are just the beginning of its capabilities. It also comprehends metadata about projects, and from that can do things like manage dependencies, generate documentation, and generate a website about the project. They have a public repository of project artifacts and tools for managing and updating a project’s dependencies. See this section of the documentation for more information. Java-centric.

CPAN – the Comprehensive Perl Archive Network is a centralized repository of Perl Modules that you can use to make sure you have the latest versions of those modules. This is a great example of a community sharing its resources.



Re: Taking Continuous Integration to the next level

Since you've blogged about NDepend, I wanted to let you know that NDepend 2.0 has been released with some major enhancements such as an interactive view of your application and a language dedicated to query and constraint the structure of your code: Code Query Language. http://www.NDepend.com http://www.ndepend.com/CQL.htm Cheers, Patrick Smacchia MVP.NET

Add a comment Send a TrackBack