Selecting a Revision Control Tool

Introduction

You definitely want a revision control tool, not just because you’re reading this, but because everyone produces and updates information, and revision control eliminates the risk of destroying your information in the process of updating it.

At first glance the choice of revision control tools is daunting, Wikipedia lists more than 25 different options. Fortunately within the realm of modern software development it is easy to pare this list down, and I will only be examining four options. I discount all of the proprietary offerings, partly because the free software options are so effective that I’ve never needed to use any of them, and partly because if you’re paying for them you should be getting support from the supplier instead of me. Next I discount all but the most popular free software options because it’s an unfortunate reality that technical superiority is secondary in importance to familiarity for users. Finally, I discount CVS as the champion whose time has passed.

The remaining options are Subversion (SVN) amongst centralized tools, and the trio of distributed tools: Git, Mercurial (hg), and Bazaar (bzr). The good news is that for small to medium sized corporate or open source software projects all of these tools are appropriate and capable of satisfying the source code control needs. The bad news is that making the final choice based on minor differentiating features can be fairly difficult. This is further complicated by the fact that all these tools are being actively developed and a useful feature is unlikely to remain unique to a single project for long.

Criteria for Selection

In order to make your decision you’ll need to assemble a list of criteria. The following list provides some general ideas about what might be important for your decision:

Project Structure and Development Process

  • Are all the committers gathered in a single location or are they geographically and networkologically diverse? If in an office environment, do you want to support working from other locations?
  • Does the project involve working with large and/or unmergeable artifacts such as images, office documents or Flash projects? Will many people be accessing these documents, or just a few who can coordinate easily?
  • Do you have a large team or strict quality constraints such that you want a dedicated “gatekeeper” to control how code arrives on the main lines of development?
  • How big is your repository in commits, files and bytes? What are your performance constraints?
  • What branching and merging strategy are you going to use?

Usability

  • How sophisticated are my users, and how easy does the tool need to be? How involved will the users be in branching and what are the consequences of any mistakes on their part?
  • How much effort will it be to maintain any shared repositories?
  • What sort of tools will I want to integrate revision control with? How much effort will it be to customize the tools to my needs?

Tool’s viability and compatibility

  • How strong is the development team supporting tool and how large is the community using it? Are they open to new ideas and do they accept patches? Does the same apply to the technology the tool is built upon?
  • Does the tool use similar technology to your project? Will it grow in the same direction as your project is likely to?
  • Does the legal and political position of the tool matter to you? Are you comfortable with the license and will it be well protected in the case of legal problems or potential IP infringement cases? Is the tool protected by the same groups as other tools you already use?

Distributed and Centralized models

In a centralized model revision control system there is one repository which is shared by all users who must contact it for history information and to commit changes. In a distributed model each user has their own repository which contains the full revision history and enables them to browse history and do commits locally. Locally committed changes may then be shared with others, typically by making them available on a server for other people to pull them or by pushing them up to a shared repository somewhere.

Distributed systems have evolved from centralized ones in order to address network and branch management issues arising in projects with many geographically spread developers. Local commits are much faster, which encourages developers to separate their development into more smaller chunks, resulting in a more comprehensible history. Combined with faster history browsing this results in developers having a better understanding of the code base.

Most distributed tools are still able to support centralized work flows because they have the functionality for pushing changes to a remote repository, much like the centralized tools do on commit. However, because the repositories are in fact detached they cannot support file locking features to prevent multiple users from simultaneously editing the same file, which can be a critical feature for projects that collaborate with unmergeable source files such as images or complex office documents.

Workflows that involve a lot of branching are typically easier to manage using a distributed system because each developer’s repository essentially becomes a new branch or set of branches in the project. For example, it enables small subteams to collaborate outside the central repository with minimal project maintenance overhead, as well as enabling the “gatekeeper” model used in Linux development, where changes must be manually accepted by someone responsible for branch stability. These workflows can also be implemented in centralized systems, but will result in a lot of branches in the repository along with attendant repository maintenance.

Subversion in Depth

Subversion uses a centralized model, and the only repository data stored in the working area is a copy of the files checked out from the server in order to support quick diff operations. The official project only provides a command line client, but there are many graphical clients including plugins for IDEs and file explorer tools. It also comes with a wealth of administrative tools. As the oldest of the VCS tools it has the most special case features, although the architectural changes of the distributed tools make some of these irrelevant.

Created to replace CVS, Subversion follows many of its design decisions while improving the most problematic areas. This makes it an easy transition for CVS users, and combined with the relative simplicity of the centralized model makes this the easiest of the tools to adopt.

Subversion is implemented in C and maintained for various POSIX style platforms, Mac OS X and MS Windows. Unofficial packages are available on many platforms, linked from the official project page. SVN started as a part of the Tigris project in 2000 and moved to the Apache Foundation in early 2010.

Project Structure and Development Process

As a centralized tool, Subversion works best with smaller numbers of committers who all have good network connections to the repository. Large numbers of committers generate a lot of traffic for the server which quickly becomes a bottleneck and necessitates complex redundancy and synchronization schemes.

One unique feature of Subversion compared with the distributed tools is file locking which allows user’s to reserve write access to unmergeable files.

While it can support workflows that involve a lot of branches, including the gatekeeper model, this comes at a cost of clutter and potential management overhead from storing many branches centrally. Relocating a series of changes from one branchpoint to another is an operation which is particularly difficult in Subversion.

Subversion’s biggest merging weakness is that directory moves cannot be correctly merged with changes to files in those directories. This means that ideally directories should only be moved when there are no branches that will be merged in the future, which can be a very difficult to achieve in some development scenarios. Furthermore because merges aren’t tracked robustly the process of doing merges is delicate and the potential for false conflicts is high.

Usability

The centralized model is significantly easier to comprehend which means that Subversion is easier to adopt than the distributed tools. It also has a much clearer separation between administrative and normal operations which reduces the possibility of damage to the repository due to user error. Similarly repository structure and maintenance is a lot clearer, reducing the chances of administrative error.

Subtle problems can occur within a Subversion working directory due to mismatched files and occasionally present a serious usability hurdle. This is because each file and directory is tracked independently within the working directory and they are able to get out of sync, particularly after a commit or partial update.

Subversion is a fairly mature project with a solid market share, and many tools have already been integrated with it. This also reflects back on the code base itself which has a fairly mature system for hooking in additional integration code.

Viability and compatibility

Subversion has all the hallmarks of a stable free software project. It is an official Apache Foundation project with financial support from CollabNet and other companies. It is widely used in corporate environments, as well being provided by the majority of software project hosters. Additionally it uses the Apache Portable Runtime and is written in C which is extremely well entrenched. It is distributed under the Apache License.

Git in depth

Git uses a distributed model and includes arbitrary local branches in each repository. The official project only provides a command line client, although there are graphical interfaces and plugins available from other projects with varying levels of completeness.

Designed and developed by kernel developers, Git’s focus is on internal structures and interfaces rather than on end user usability.

Git is implemented in C and maintained for POSIX style platforms. A less mature MS Windows port and GUI client are produced by other projects. It was started in 2005 in order to manage the Linux kernel source, and is developed under the same umbrella as the Linux kernel itself.

Project Structure and Development Process

Git has the typical benefits of the distributed tools described earlier.

A Git repository can contain multiple branches internally, which means that a shared repository can efficiently store many branches. Similarly this adds some flexibility to the handling of working directories for creating collaborative branches. Of course, the gatekeeper model is Git’s main focus since that’s the development process used by the Linux kernel.

Git is not particularly well suited to managing unmergeable files because it is unable to provide any sort of locking mechanism. Furthermore, unmergeable files tend to be large, so replicating the entire repository may start to become an expensive operation.

Usability

Git is the most difficult to learn of the four tools. It is designed in two layers, with both exposed to the user and a large amount of command line options available. It is all documented, but presents an intimidating mass of options to the new user. A GUI for MS Windows is available based on TortoiseSVN, although it may not cover all required functionality, in particular rebase may not be available.

Perhaps the biggest usability concerns with Git are repository maintenance. The documentation requires a reasonably in-depth understanding of the repository structure in order to access basic administrative commands. The commands themselves are very powerful and give the potential to damage the repository, and so care must be taken to fully understand what is being done. Git is best used with a clearly stated branching strategy, and recipes for managing that strategy.

Viability and compatibility

Git is essentially part of the same project which produces the Linux kernel, which giving it plenty of stability and support. Its development community is satisfactorily large, and it enjoys a significant amount of enthusiastic support from some of its users.

Git is distributed under the GPL version 2 only.

Bazaar in depth

Bazaar supports both the distributed and centralized models, allowing working directories to be created with or without a local repository, providing flexibility in how projects are organized. The official project provides a command line client as well as several GUI clients.

The project arose from GNU Arch, an older distributed tool. Bazaar seems focussed on usability and the pragmatic side of software development, leading it to implement features ignored by other distributed tools. Mercurial and Git have a higher profile than Bazaar, most likely because of their association with the Linux kernel.

Bazaar is implemented in Python and is maintained for various POSIX style platforms, MS Windows and Mac OS X. It was started in 2005 by Canonical Ltd. the creators of the Ubuntu distribution and became a part of the GNU Project in 2008.

Project Structure and Development Process

Bazaar has the typical benefits of the distributed tools described earlier.

Additionally, Bazaar’s unique ability to support both distributed and centralized clients on a single project mean that additional more obscure scenarios can be managed. While Bazaar should be able to support file locking when using its centralized mode, this functionality is not in the core application and hopefully can be provided by plugins.

Bazaar does not support multiple branches contained within a single repository, so having multiple shared branches means creating many copies of the repository itself, and any potential disk-space and management overheads.

Usability

Usability is an important factor in the development of Bazaar, to the extent that it’s the only one of these tools which ships with a GUI client. The ability to work in a centralized way also makes for a very easy introduction for users.

Like the other distributed tools the command line application mixes administrative with basic commands which can be a source of confusion.

Compared with the other options, Bazaar is integrated with fewer third party tools such as IDEs and Continuous Integration systems, probably due to having a lower profile.

Viability and compatibility

Bazaar may have the smallest community of the projects, but has become a part of the GNU project and is supported by Canonical Ltd. who have had significant success with their Linux distribution, Ubuntu.

Bazaar is distributed under the GPL version 2 or later.

Mercurial in Depth

Mercurial uses a distributed model and permits multiple named local branches in each repository. The official project provides a command line client, and endorses the independent TortoiseHg graphical interface which integrates with MS Windows file explorer and the Gnome file explorer.

Mercurial’s origin is in open source software development, and its development seems to be headed in that direction, for example patch management with Mercurial Queues is given lots of coverage in the manual.

Mercurial is implemented in Python and maintained for POSIX style platforms, Mac OS X and MS Windows. It was started in 2005 to support Linux kernel development, but was passed over in favour of Git which was developed by Linus himself, and now continues as an independent project.

Project Structure and Development Process

Mercurial has the typical benefits of the distributed tools described earlier.

Being able to store multiple branches within a repository provides efficient storage for workflows that involve many branches, including private collaborative branches. It also provides good support for patch management, which can be particularly useful when working extensively with open source software. Other common open source needs may also be easily addressed by Mercurial.

Mercurial is not particularly well suited to managing unmergeable files because it is unable to provide any sort of locking mechanism. Furthermore, unmergeable files tend to be large, so replicating the entire repository may start to become an expensive operation.

Usability

Learning the basics of Mercurial is quite straight forward, and the more difficult concepts and dangerous administrative commands are not in the forefront of documentation. On MS Windows a graphical interface integrated with the file explorer is suggested to improve ease of use.

Many IDEs provide integration with Mercurial.

Repository management and the more advanced features provide some usability hurdles.

Viability and compatibility

Mercurial is in use by plenty of high-profile open source projects and has a decent size of community contributors. It is a member of the Software Freedom Conservancy, allowing it to receive donations, and providing it legal defence. The project home page is hosted on the same server as the leader developer’s consulting pages, and perhaps it’s just me, but I get frequent DNS errors on their domain.

It is distributed under the GPL version 2 or later.

Detailed Feature Comparison

Bazaar Git Mercurial Subversion
General Features
Centralized Development Yes Yes Yes Yes
Distributed Development Yes Yes Yes
Good Performance on Large Repositories Yes Yes Yes
Good Repository compression Yes Yes Yes
Read & Write Protocols Native,
ssh,
HTTP
ssh ssh,
HTTP
Native,
ssh,
HTTP
Workflow Features
Can Access Multiple Repositories in a Single Working Directory Yes
Checkout without Copying Repository Yes Yes
Exclusive File Locking Yes
Multiple Branches in a Repository Yes Yes Yes
Advanced Merge Algorithms Yes Yes
Moving branches (rebase) Plugin Yes Plugin
Administrative Features
Repository Hooks Yes Yes Yes Yes
Client Extensions/Plugins Yes Yes
Requires Regular Repository Maintenance Yes
Usability Features
Official GUI Tool Yes
“Tortoise” file explorer tool project Official Young Yes Mature
Eclipse Integration Plugin Plugin Plugin Plugin
Microsoft Visual Studio Integration Plugin Plugin
NetBeans Integration Plugin Full Full
Staging Area Before Committing Yes
Shelving Local Changes Temporarily Yes Plugin
Horizons – taking part of the repository TBD TBD TBD TBD
Integration Features
Integrated Web Code Review Tool Gerrit Rietveld
Integrated Email Code Review Tool PQM
Web based repository browsing Yes Yes Yes Yes
Bug Tracker Integration TBD TBD TBD TBD
Read only repository via static HTTP Yes Yes
Technical Details
Language Python C Python C
License GPLv2+ GPLv2 GPLv2+ Apache
Founding Year 2005 2005 2005 2000
Version compared 2.2.0 1.7.3 1.6.3 1.6.12

Published: 2011-01-30