Version Control, Git, and your Enterprise
Understanding Git – both its benefits and limits – and deciding if it’s right for your enterprise
This article is intended to highlight some of the key advantages and disadvantages typically experienced by enterprises. It then presents the key questions to be contemplated by your enterprise in determining whether Git is right for you and what you need to consider in moving to Git.
Git is not a commercial tool you buy, but another in a long line of successful open source version control tools. It was started out of a parting of ways between the Linux kernel development community and a commercial tool BitKeeper. A 2014 Eclipse Community Survey showed Git to be the leading code management tool, used by a third of those software developers. While that data is skewed to open source, there is no question that Git is also the rising choice for developers within Enterprises as well.
Pros of Git
To start to assess if using Git is the right choice for your team or enterprise, it is helpful to know what makes Git attractive and different. The most striking difference is that everything is local, starting with the repository and carrying all the way into the operations. All the things you expect to be able to do with a version control tool are available in Git. From history to branching and merging, local brings with it an inherent aspect of speed, since communication over a network to a server is minimized. Since all data is local, it allows each user to work offline from any server, so they can work virtually anywhere they may find themselves and their computer.
Merging is often the most undesirable part of doing version control, but Git makes it better by providing powerful and specific merge strategies for different merging use cases. Whether you want to merge branches, rebase your branch’s origin, or selectively merge commits, Git has functionality to use in addressing it. And the distributed model really frees up the developer to be able to work with version control in a manner that brings them the most value. Commit early and often, while in turn being able to deliver changes to users upstream, in larger and more logical chunks.
Developers want the power to do whatever they may find a need to do. With Git they have more granular control of what is done and how, then they have experienced with other tools. Often Git’s large array of operations is split into two categories – the porcelain and the plumbing. Obviously making an analogy to something like a sink, the point being made is that the traditional tools let you interact with the porcelain, that is the abstraction and controlled interface to the tool, but Git also lets you get under the basin and behind the faucets to change how version control is executed, including rewriting history. Whether a developer needs that power or not, they like knowing that it is at their disposal to use.
Cons of Git
But like all development tools, Git is not a perfect solution. It is true that Git is one of the fastest- growing version control tools on the marketplace, yet there is still a significant divide not only between its adoption for enterprise use and for open source development, but also between being adopted by individual development teams and being embraced by the enterprise. That divide exists because of the perceived cons of the tool – which, like its pros, are compelling.
In 2013, a survey was conducted by CollabNet that asked enterprises what their concerns were with using Git. The top three responses included security, governance and tool integration – with director level and higher employees citing security as their number one concern. So why are these three areas a roadblock to enterprise adoption of Git?
It goes back to the origins of the tool itself. It was originally written to implement the process of a specific development approach. The creators knew what they wanted from the tool, but had limited interest in how others would use it. Of course, Git has evolved over time and reacted to the requirements and needs of others. But it remains very much engineered around an open source type of development which leaves gaps for enterprise development.
In open source development, the workflow is very different – the needs for branching are fewer, the interest in governance is almost non-existent, and security is more simplistic, given that, by the nature of open source, the world has read access to the code and write access is broadly given, but to only a truly select few. While enterprise needs vary, as do project team needs within a given enterprise, enterprises commonly require more branching, desire true governance, need more granular access controls, and require additional security measures like tying into their corporate LDAP.
The issue for enterprises is not really how Git functions for the individual developer, but how the required canonical repository and upstream processes are supported. Since Git has no internal concept of a canonical repository, the needs to protect the central point of truth have to be supported in some other way. For example, rewriting history locally is very useful to the developer, but rewriting the history in the canonical server is an inherent risk to true configuration management processes.
Those different requirements have resulted in many different efforts to address enterprise needs, while still leveraging the many benefits of Git. They are not necessarily “forks” in the traditional sense, but rather add-ons, or wrappers, which try to address the same issues in different ways. That means that one development team will select an architecture for using Git that may be different than an architecture another team in the same enterprise chooses to use. And that is something that is definitely not in the best interests of the enterprise as a whole.
Truly, one team’s Git is not another team’s Git. The enterprise routinely sees teams that have unique tools, processes and practices claiming the same base tool as a foundation. They quickly find that these unique solutions limit the enterprise’s own agility and scalability. There is no clear way to get visibility to all the applications given they reside in all sorts of different environments and locations; and without visibility, organizations cannot maintain the governance needed to be sure people are following standards in their processes. Resources get used to support the various solutions, when time could be better spent delivering value and innovation.
Additionally, Git has inherent issues around how large a repository can grow and still be effectively used by multiple development teams. Added to that is its limitations in supporting large binaries. As a result, larger applications require multiple repositories to control all the artifacts involved thus bringing additional deficiencies to light. For example, enterprise teams want to be able to perform operations (like branching and tagging) on all of their application code without regard to the number of repositories involved. They would like to execute those operations on all related repositories simultaneously, but there is no concept of an atomic cross repository operation. No functionality exists in core Git to address this issue so enterprises are forced to consider adding third party functionality like Git Slaves or the Android community’s repo tool.
On the flipside, Git branching and merging are applied to the whole of a repository limiting Git repositories to a one-to-one relationship with an application. If an enterprise has utilized a single repository to store the assets for multiple, likely related, applications, they will be forced to create individual repositories for each of those applications in order to isolate development and document milestones for the specific application. The impact goes beyond the larger number of repositories by requiring modifications to build, checkout and setup processes and scripts to account for the changed layout.
In addition, the underlying cloning concept behind Git can lead users and project teams to fork applications and modules rather than more appropriately using branching. A new release of an application should not be supported by a new repository based on the previous release, but instead supported by a branch for that new release in the existing repository. The same should be true when a team needs to modify a shared component or library. Branches allow for isolation of work and even of unique variations of a common base, but provide all of that from a common repository that all users of that code can evaluate and use.
Finally if not a widespread reality in an enterprise, code reuse and the utilization of third party libraries are at least seen as desirable, but again Git has only imperfect responses to these requirements. An organization can attempt to use Git submodules, which are present in the standard Git binaries, but that will require additional implementation knowledge and operational steps from every developer. They can instead choose to use the contributed Git subtrees functionality, but have to be sure that all those charged with establishing such relationships pull down the code and use it appropriately. Neither approach is without issues thus leaving enterprises wishing for better solutions.
Questions to Consider
In these ways, the pros and cons of Git pull strongly in different directions. If you are thinking of migrating existing products/applications from other version control tools to Git, you should consider the following questions:
- What process will best meet your requirements for version control?
- What are your organization’s governance and security needs?
- What all needs to be integrated with Git and what data needs to be migrated?
- How will you train users on Git and your processes?
First, what process will best meet your requirements for version control? Too many organizations think that while they have to migrate data into a different structure, they can continue using the same processes without making changes. They fail to realize that those processes are likely molded by what their current tool does (and does not) do well. To try and apply that process to a radically different tool like Git is likely to cause a “square peg in a round hole” situation. Instead, go back to your base requirements and apply Git’s best practices to arrive at processes that will best meet your needs and take advantage of all that Git has to offer.
People often think that there is a singular best practice model for branching and merging with a particular tool. Branching is driven by a need for isolation not by a tool’s functionality. GitFlow is often mistaken as Git’s branch and merge model, but outside of the name and the associated scripts, it has no real affinity towards being used with Git versus another version control tool.
The basis of most, if not all, branching models is one of two foundational approaches with the choice based on the need to support concurrent release development. The unstable master approach is designed to work with development that is primarily focused a singular new release at any point in time, which is how most open source development is done given the transient nature of its workforce. The stable master approach is designed for development that has multiple releases underway simultaneously. That is the case with much of the development done in enterprises.
GitFlow appears to be based on the stable master foundation since work is not directly done on the master branch, but it really is designed for the serial release approach of the unstable master foundation having just moved the instability to a general, on-going development branch. GitFlow’s variations makes it less attractive as a best practice and certainly does not make it the tool’s standard approach for enterprise adoption.
Next, you to need to ask yourself, what are your organization’s security and governance needs? Do you need to control access at a level deeper than a repository? Do you need to define general access rules via roles that can be applied across application teams? Do you need to leverage a single account and password source? Do you have defined processes that teams are expected to follow that you need to be able to monitor? Answering questions like these will help you define the access controls and review processes that will meet those requirements – or may help you determine that Git is not the best fit for your enterprise or that specific project.
Having security and governance requirements does not mean that Git cannot be used, but it does mean that an appropriate wrapper must be leveraged that provides this additional functionality. It is important to identify one that can meet the varying needs that are likely present across the many development teams in your organization. It has to be able to let changes be applied directly to the canonical repository when governance is not required, but enforce processes that must be adhered to before changes are applied, when required, while allowing for such processes to be defined at the team level. It is important to commit to a single choice of wrapper in order to support the enterprise level requirements for centralized governance and visibility.
Gerrit is an open source solution that meets these requirements. While Gerrit was designed to specifically support code reviews and implements that functionality in a way that only makes approved commits publicly available, it additionally provides functionality to support an enterprise’s needs for security and governance. It treats each repository as a project allowing for unique security and governance to be applied. It allows access controls to be uniquely defined down to the branch level and to specific operations, including tagging and pushing merges. And it provides an integration to LDAP for enterprises who require such. It is an excellent example of an enterprise Git solution.
Next, determine what all needs to be integrated with Git and what data needs to be migrated? You need to make sure you have a platform and partner to succeed in this area, to get the tools you need for success. You also have to honestly evaluate what data needs to be migrated into Git. For some environments, full history migrations are not needed (or even possible).
The days of version control tools standing as their own island in the sea of software development are long gone. Enterprises need to have seamless connections to integrated development environments (IDEs), issue management tools, testing tools, build tools, and DevOps. And many of those tools need to be integrated with each other as well. Identifying what your organization uses and how it can benefit from duplicating existing integrations with version control, as well as adding additional ones, will bring maximum value from your Git environment through a platform that supports such integrations.
If you have ever moved your personal belongings, then you can relate to the challenges of moving data from one version control tool to another. You need to evaluate what you truly need in your new living space, what is useless, and what is not needed now, but needs to be available if it does become required. Have you ever helped someone move who just packed everything? Not a pretty picture especially if they moved into smaller quarters. Version control is history, but most of that history is of value for only a limited period of time. And the various tools do not provide the same features, implement common features in the same way nor store things in the same manner. Evaluate what can be migrated and what really needs to be migrated. Leave the rest in the storage of the legacy tool.
Finally, how will you train users on Git and your processes? Do not just assume that because it is an open source tool, users will simply “catch on.” Git is a complicated tool and a drastic departure from “business as usual.” Use lunch-and-learns, web-based learning, and instructor-led trainings to bring your team up to speed. Train subject matter experts with more interaction with experts and additional hands-on opportunities to provide the much needed first line of support of Git within your enterprise. Users need to be educated on the tool’s functionality, separate from training on how your enterprise or their project team will use it. Process training is always important, but should be delivered separately.
Git has many benefits and is the fastest-growing version control solution for a reason. By understanding its benefits, its limitations, how it functions, how it meets the need of your business, and what you need to supplement it with, you can decide if it is the right tool for your enterprise.
About the Author
Bob Jenkins is Director of Version Control services at CollabNet. His background includes 19 years focused on Application Lifecycle Management tools with a particular focus on version controls from ClearCase to Subversion and Git. At CollabNet (where has been for 14 years), he primarily focuses on consulting with enterprises planning to adopt Git and Subversion along with developing end user training materials for both version control tools. He has consulted with hundreds of enterprises to assist them with successfully implementing their version control tool of choice.