Article

Distributed Version Control Systems: A Not-So-Quick Guide Through
Posted by Sebastien Auvray on May 07, 2008 05:02 PM
Since Linus Torvalds presentation at Google about git in May 2007, the adoption and interest for Distributed Version Control Systems has been constantly rising. We will introduce the concept of Distributed Version Control, see when to use it, why it may be better than what you're currently using, and have a look at three actors in the area: git, Mercurial and Bazaar.
RelatedVendorContent
IBM Agile Development eKit: Free Articles, Expert Q&A, Educational Resources
Info 2.0: IBM's vision for the world of Web 2.0 and enterprise mashups (Webcast)
Scale your applications without punishing your database
Hibernate without Database Bottlenecks
Snapshots from SOA Governance: Artifacts, People, Processes, Repositories
Related Sponsor
What?
A Version Control System (or SCM) is responsible for keeping track of several revisions of the same unit of information. It's commonly used in software development to manage source code project. The historical and first project VCS of choice was CVS started in 1986. Since then many other SCM have flourished with their specific advantages over CVS: Subversion (2000), Perforce (1995), CVSNT (1998), ...
In December 1999, in order to manage the mainline kernel sources, Linus chose BitKeeper described as "the best tool for the job". Prior to this Linus was integrating each patch manually. While all its predecessors were working in a Client-(Central)Server model BitKeeper was the first VCS to allow a truly distributed system in which everybody owns their own master copy. Due to licensing conflicts, BitKeeper was later abandoned in favor of git (Apr, 2005). Other systems following the same model are available: Mercurial (Apr, 2005), Bazaar (Mar, 2005), darcs (Nov, 2004), Monotone (Apr, 2003).
Why?
Or a more precise question: Why Central VCS (and notably Subversion) are not satisfying?
Several things are blamed on Subversion:
- Major reason is that branching is easy but merging is a pain (but one doesn't go without the other). And it's likely that any consequent project you'll work on will need easy gymnastic with splits, dev, test branches. Subversion has no History-aware merge capability, forcing its users to manually track exactly which revisions have been merged between branches making it error-prone.
- No way to push changes to another user (without submitting to the Central Server).
- Subversion fails to merge changes when files or directories are renamed.
- The trunk/tags/branches convention can be considered misleading.
- Offline commits are not possible.
.svnfiles pollute your local directories.svn:externalcan be harmful to handle.- Performance
The modern DVCS fixed those issues with both their own implementation tricks and from the fact that they were distributed. But as we will see in conclusion, Subversion did not resign yet.
How?
Decentralization
Distributed Version Control Systems take advantage of the peer-to-peer approach. Clients can communicate between each other and maintain their own local branches without having to go through a Central Server/Repository. Then synchronization takes place between the peers who decide which changesets to exchange.

This results in some striking differences and advantages from a centralized system:
- No canonical, reference copy of the codebase exists by default; only working copies.
- Disconnected operations: Common operations such as commits, viewing history, diff, and reverting changes are fast, because there is no need to communicate with a central server. Even if a central server can exist (for stable, reference or backup version), if Distribution is well used it shouldn't be as much queried as in a CVCS schema.
- Each working copy is effectively a remoted backup of the codebase and change history, providing natural security against data loss.
- Experimental branches – creating and destroying branches are simple operations and fast.
- Collaboration between peers made easy.
For an introduction to DVCS collaboration pratices, you might have a look at the Intro to Distributed Version Control (Illustrated) or possible Collaboration workflows.
You should also be aware that there are some disadvantages in opting for DVCS, notably in term of complexity; This decentralized view is very different from Central world and it might need some time to get used to for your developers. Changeset tracking instead of file tracking can also be confusing even if very powerful and making it theoritically possible to track method move through file.
Who?
The battle rages on! Some of the Good and the Bad.
The good and the bad essentially from an updated (because some old arguments are not true anymore) compilation of blogs and my personal experience.
You should notice that it is a very short list of features (ie git has more than 150 commands), and some issues might be more critical than others.
| git | Mercurial | Bzr | |
|---|---|---|---|
![]() |
![]() |
||
| Project | |||
| Maintainer | Junio C Hamano | Matt Mackall | Canonical Ltd. - Became GNU project |
| Concurrency model | Merge | Merge | Merge |
| License | GPL | GPL | GPL |
| Platforms supported | POSIX, Windows, Mac OS X | Unix-like, Windows, Mac OS X | Unix-like, Windows, Mac OS X |
| Cost | Free | Free | Free |
| Maturity | |||
| Version | |||
| Project Start | Apr, 2005 | Apr, 2005 | Mar, 2005 |
| Implementation | |||
| SLOC (without Test src) | ![]() |
![]() |
![]() |
| SLOC Count | 130550 | 38172 | 79864 |
| Test Suites | |||
| History model | Snapshot | Changeset | Snapshot |
| Repo. growth | O(patch) | O(patch) | O(patch) |
| Network protocols | HTTP, FTP, email bundles, custom, ssh, rsync | HTTP, ssh, email | HTTP, SFTP, FTP, ssh, custom, email bundles |
| Basic Features | |||
| Atomic commits | |||
| File renames | |||
| Merge file renames | |||
| Symbolic links | |||
| Pre/post-event hooks | |||
| Signed revisions | |||
| Merge tracking | |||
| End of line conversions | |||
| Tags | |||
| International Support | |||
| Partial checkout | |||
| Model / Architecture | |||
| File | Single top-level .git directory |
Single top-level .hg directory |
Single top-level .bzr directory |
| Model | Simple branch model (a clone is a branch) | Simple branch model (a clone is a branch) | |
| Repository Specificities | Shared repositories for sharing revisions between branches. Supposed-to-be Better Storage Model |
||
| Directories versionable | |||
| Submodules | git-submodule |
||
| Per file commit | |||
| Rebase / Queue | rebase |
||
| Web Access | |||
| gitweb, wit, cgit | hgweb (single rep), hgwebdir (multi rep) | webserve, trac, Loggerhead | |
| Integration | |||
| Integration-ability | |||
| Migration | git-svn is also a very powerful and easy to put in place bi-directional gateway between Subversion and Git allowing you to use Git over an existing Subversion Repository. |
hgsvn not as polished as git-svn. |
|
| Issue Tracker Integration | |||
| IDE Plugins | |||
| Plugins | |||
| Performance | |||
| Advanced Features | |||
| Complexity | |||
| Bzr pretends to hide complexity by keeping a clean User Interface while adapting to the different collaboration workflows and their evolution in a team. | |||
| Revision Naming | |||
| Commands | rename command which differs from other SCM (won't be changed because of backward compatibility).git is the most advanced SCM in term of commands but if you add all possible commands and their options, you end up with a huge number of possibilities that it's hard to master. The fact that such tool like Easy Git exists means that Git can be considered quite complex. |
||
| UserBase | |||
| Linux kernel, Cairo, Wine, X.Org, Rails, Rubinius, Compiz Fusion | Xine, OpenJDK, OpenSolaris, NetBeans, (Part of) Mozilla |







