Use the Force, Luca - Jenkins Developer Wipes out a Month of Commits on GitHub
Yesterday, a developer on the Jenkins project accidentally triggered a force push on the GitHub repositories that store the Git repositories for the Jenkins codebase, wiping out several months of commits. The community were very understanding and the problem was quickly resolved, but it does highlight an area where GitHub's openness (coupled with the Jenkins CI organisation's openness to allow anyone to commit to any repository) can magnify problems when they occur.
A Git forced push, run with
git push --force, tells the server to replace the content of the references (branches/tags) being pushed with the content given. Normally, a Git repository will only allow fast forward pushes; that is, where the pushed reference has the current reference as an ancestor. A force push removes that restriction, allowing the content to replace what was there before.
Configurations in Git repositories can be configured to allow or deny this with the git config value
receive.denyNonFastForwards true. This prevents such forces from occurring.
There are cases where enabling force is useful; for example, if a refactoring or a filtering operation such as
git filter-branch is executed, then the commits will not be ancestors of the current branch and so won't work. Another use-case is when mirroring is enabled, to synchronise the content of two repositories, where you want the changes to go through without erroring.
That's what happened in this case – Luca was testing the Gerrit mirroring plugin, and had a set of repositories for the Jenkins repository checked out locally. The Gerrit mirror was set up to take content from this local repository, and resulted in all of the repositories being mirrored from his local checkouts. Unfortunately, since the repositories hadn't been updated in a while the net effect was to reset the repositories to a prior state.
Fortunately all the repositories have been restored at this point – one of the advantages of the Git version control system (or any DVCS) is that you can repopulate a repository from any of its clones, and it was easy enough to do this. GitHub support were very helpful and provided support from the server-side reflogs (used to identify changes in branches) in order to re-acquire the content. But it does ask two particular questions of how to mitigate this in future:
- Does it make sense to have users committing against multiple repositories, or should changes come in through a managed channel such as a review/pull request?
- Does it make sense for GitHub to offer the configuration option to
GitHub's main competitor, BitBucket, does provide an option to disable nonFastForwards. BitBucket was taken over by Atlassian and used to be the canonical location of the also-ran DVCS, Mercurial. However, BitBucket's growth has been in Git hosting solutions, and their Git management solution Atlassian Stash, is focussed solely on Git repositories.
Ironically, Luca has a company providing Gerrit based repositories called GerritForge and has recently authored a book on Learning Gerrit Code Review, recently reviewed by InfoQ. Perhaps if the Jenkins repositories were using a review-based tool such as Gerrit this may not have happened.
Until such time as GitHub offer the configuration for nonFastForwards, the Jenkins developers are writing a tool to track pushes to the GitHub repository and recording the changes that are made, along with the SHAs of the commits. Ironically they propose using
rsync to back them up to multiple locations.
With great power, comes great responsibility, and GitHub's use of the force certainly has that opportunity. Whether GitHub will provide an option to prevent this in the future or not, it's worth being aware if you host large enterprise repositories that aren't backed up.
CollabNet History Protection - A possible solution to accidental force pushes
More details can be found on luksza.org/2012/cool-git-stuff-from-collabnet-p...