InfoQ

Article

Distributed Version Control Systems: A Not-So-Quick Guide Through

Posted by Sebastien Auvray on May 07, 2008 05:02 PM

Community
Architecture,
Agile
Topics
Technology,
Collaboration,
Teamwork
Tags
DVCS,
Mercurial,
CVS,
Version Control,
git,
Hg,
bzr,
Subversion,
VCS

Since Linus Torvalds presentation at Google about git in May 2007, the adoption and interest for Distributed Version Control Systems has been constantly rising. We will introduce the concept of Distributed Version Control, see when to use it, why it may be better than what you're currently using, and have a look at three actors in the area: git, Mercurial and Bazaar.

RelatedVendorContent

IBM Agile Development eKit: Free Articles, Expert Q&A, Educational Resources

Info 2.0: IBM's vision for the world of Web 2.0 and enterprise mashups (Webcast)

Scale your applications without punishing your database

Hibernate without Database Bottlenecks

Snapshots from SOA Governance: Artifacts, People, Processes, Repositories

Related Sponsor

VersionOne is recognized by Agile practitioners as the leader in Agile project management tools. Companies such as Adobe, BBC, CNN, Dow, HP, IBM, Sony and 3M have turned to VersionOne to help deliver greater value to their customers.

What?

A Version Control System (or SCM) is responsible for keeping track of several revisions of the same unit of information. It's commonly used in software development to manage source code project. The historical and first project VCS of choice was CVS started in 1986. Since then many other SCM have flourished with their specific advantages over CVS: Subversion (2000), Perforce (1995), CVSNT (1998),  ...

In December 1999, in order to manage the mainline kernel sources, Linus chose BitKeeper described as "the best tool for the job". Prior to this Linus was integrating each patch manually. While all its predecessors were working in a Client-(Central)Server model BitKeeper was the first VCS to allow a truly distributed system in which everybody owns their own master copy. Due to licensing conflicts, BitKeeper was later abandoned in favor of git (Apr, 2005). Other systems following the same model are available: Mercurial (Apr, 2005), Bazaar (Mar, 2005), darcs (Nov, 2004), Monotone (Apr, 2003).

Why?

Or a more precise question: Why Central VCS (and notably Subversion) are not satisfying?
Several things are blamed on Subversion:

  • Major reason is that branching is easy but merging is a pain (but one doesn't go without the other). And it's likely that any consequent project you'll work on will need easy gymnastic with splits, dev, test branches. Subversion has no History-aware merge capability, forcing its users to manually track exactly which revisions have been merged between branches making it error-prone.
  • No way to push changes to another user (without submitting to the Central Server).
  • Subversion fails to merge changes when files or directories are renamed.
  • The trunk/tags/branches convention can be considered misleading.
  • Offline commits are not possible.
  • .svn files pollute your local directories.
  • svn:external can be harmful to handle.
  • Performance

 

The modern DVCS fixed those issues with both their own implementation tricks and from the fact that they were distributed. But as we will see in conclusion, Subversion did not resign yet.

How?

Decentralization

Distributed Version Control Systems take advantage of the peer-to-peer approach. Clients can communicate between each other and maintain their own local branches without having to go through a Central Server/Repository. Then synchronization takes place between the peers who decide which changesets to exchange.

This results in some striking differences and advantages from a centralized system:

  • No canonical, reference copy of the codebase exists by default; only working copies.
  • Disconnected operations: Common operations such as commits, viewing history, diff, and reverting changes are fast, because there is no need to communicate with a central server. Even if a central server can exist (for stable, reference or backup version), if Distribution is well used it shouldn't be as much queried as in a CVCS schema.
  • Each working copy is effectively a remoted backup of the codebase and change history, providing natural security against data loss.
  • Experimental branches – creating and destroying branches are simple operations and fast.
  • Collaboration between peers made easy.

 

For an introduction to DVCS collaboration pratices, you might have a look at the Intro to Distributed Version Control (Illustrated) or possible Collaboration workflows.

You should also be aware that there are some disadvantages in opting for DVCS, notably in term of complexity; This decentralized view is very different from Central world and it might need some time to get used to for your developers. Changeset tracking instead of file tracking can also be confusing even if very powerful and making it theoritically possible to track method move through file.

Who?

The battle rages on! Some of the Good and the Bad.

The good and the bad essentially from an updated (because some old arguments are not true anymore) compilation of blogs and my personal experience.
 You should notice that it is a very short list of features (ie git has more than 150 commands), and some issues might be more critical than others.

  git Mercurial Bzr
 
Project      
Maintainer Junio C Hamano Matt Mackall Canonical Ltd. - Became GNU project
Concurrency model Merge Merge Merge
License GPL GPL GPL
Platforms supported POSIX, Windows, Mac OS X Unix-like, Windows, Mac OS X Unix-like, Windows, Mac OS X
Cost Free Free Free
Maturity      
Version  > 1.0 (1.5.5)  > 1.0 (1.0)  > 1.0 (1.3.1)
Project Start Apr, 2005 Apr, 2005 Mar, 2005
Implementation      
SLOC (without Test src)
SLOC Count 130550 38172 79864
Test Suites  ~20% of sources dedicated to Tests  ~25% of sources dedicated to Tests  ~50% of sources dedicated to Tests
History model Snapshot Changeset Snapshot
Repo. growth O(patch) O(patch) O(patch)
Network protocols HTTP, FTP, email bundles, custom, ssh, rsync HTTP, ssh, email HTTP, SFTP, FTP, ssh, custom, email bundles
Basic Features      
Atomic commits
File renames  implicit
Merge file renames
Symbolic links
Pre/post-event hooks
Signed revisions  Partial / Manual verification
Merge tracking
End of line conversions  Planned (1.6)
Tags
International Support  Planned
Partial checkout  Use submodules instead  Planned  Planned
Model / Architecture      
File Single top-level .git directory Single top-level .hg directory
Single top-level .bzr directory
Model   Simple branch model (a clone is a branch) Simple branch model (a clone is a branch)
Repository Specificities     Shared repositories for sharing revisions between branches.
Supposed-to-be Better Storage Model
Directories versionable
Submodules  Submodule support via git-submodule  Submodule support via the Forest extension (as used by OpenJDK) Workaround with 3rd Party tool ConfigManager
Per file commit  Goes against architecture  Goes against architecture  Goes against architecture
Rebase / Queue  rebase  Mercurial Queues  Rebase plugin, Loom plugin (comp. with Quilt)
Web Access      
   Note: Repository can also be shared read-only via static files over HTTP.  Note: Repository can also be shared read-only via static files over HTTP.  Not as good as the 2 others. Faster Smart Server now available.
  gitweb, wit, cgit hgweb (single rep), hgwebdir (multi rep) webserve, trac, Loggerhead
Integration      
Integration-ability  git is more scriptable than integrable through API (even if there are some frontend api like Ruby/Git)  Rich API
Migration  Good. git-svn is also a very powerful and easy to put in place bi-directional gateway between Subversion and Git allowing you to use Git over an existing Subversion Repository.  Good. hgsvn not as polished as git-svn.  Well covered but slow.
Issue Tracker Integration  Trac Versioning System Backend Plugin avail. Bugzilla workaround. No JIRA Plugin.  Trac Versioning System Backend Plugin avail. Bugzilla avail. JIRA.  Trac Versioning System Backend Plugin avail. Bugzilla avail. No JIRA Plugin.
IDE Plugins  Existing dev versions: Idea, Eclipse, NetBeans  Existing dev versions: Idea, Eclipse, NetBeans  Existing dev versions: Idea, Eclipse. Missing: NetBeans
Plugins  Emacs / Vim / ...  Emacs / Vim / ...  Emacs / Vim / ...
Performance      
   git has always been historically faster than its competitors    bzr has historically been the slowest of the 3.
Advanced Features      
   With more than 150 binaries it's hard not to find the killing command you always dreamt of (even if this increases complexity)    
Complexity      
      Bzr pretends to hide complexity by keeping a clean User Interface while adapting to the different collaboration workflows and their evolution in a team.
Revision Naming  Git revisions are SHA-1 making it less userfriendly when doing a diff between two revisions. This was chosen to guarantee safety and integrity of data and also happens to avoid collision when merging with other peers.  Simple naming  Simple revision id naming r1, r2, etc...
Commands  Familiar, with some specificities like rename command which differs from other SCM (won't be changed because of backward compatibility).
git is the most advanced SCM in term of commands but if you add all possible commands and their options, you end up with a huge number of possibilities that it's hard to master.
The fact that such tool like Easy Git exists means that Git can be considered quite complex.
 Familiar (not far from subversion)  Familiar
UserBase      
   Large userbase / Numerous (and large) Projects running git and interest in user feedback  Large userbase / Numerous (and large) Projects running hg.  Smallest Market Share: Apart from Canonical projects (Ubuntu, Launchpad), no big names are using it yet. Bazaar is also less well-known.
  Linux kernel, Cairo, Wine, X.Org, Rails, Rubinius, Compiz Fusion Xine, OpenJDK, OpenSolaris, NetBeans, (Part of) Mozilla