Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Atlassian's Stash Data Center Offers High Availability and Scalability for Git

Atlassian's Stash Data Center Offers High Availability and Scalability for Git

This item in japanese

Atlassian recently released Stash Data Center, a highly available and horizontally scalable deployment option for its on-premises source code and Git repository management solution Stash. New nodes can be added without downtime to provide active-active clustering and instant scalability.

Stash Data Center joins the JIRA and Confluence Data Center editions, which are designed for enterprise scenarios that require “high availability and performance at scale“. All offerings provide the same end-user features as the single server products and employ similar clustering technology to benefit concurrent user capacity, application resilience and quality of service. They are licensed by user count rather than the number of servers or CPUs to enable custom and flexible infrastructure choices at predictable costs.

Atlassian Stash Clustering Overview

Stash Server is "used by more than 13,000 organizations worldwide" and includes corporate environment oriented features such as:

Stash Server already offers detailed guidance for enterprise usage, high availability and scaling. Even higher demands can now be addressed by a Stash Data Center cluster:

InfoQ caught up with Eric Wittman (General Manager Developer Tools) about Atlassian’s latest data center offering.

InfoQ: Stash Server provides vertical scaling to thousands of users on a single server already and is being used by small and large companies alike. What have been the main drivers to further improve scalability with Stash Data Center?

Wittman: While you can scale up for a large number of users, scaling up always has limits based on physical servers and we wanted to avoid this by also providing the ability to scale horizontally. Besides having customers looking to scale usage beyond 10,000 users, the other driver for increasing scalability with Stash Data Center is to handle the heavy demand build servers can place on an SCM system during peak load as organizations increase their CI practices.

InfoQ: Stash Data Center is documented to scale horizontally almost linearly to at least 4 nodes and, given the user based pricing, you are encouraging customers to "add as many nodes as you want". Is there an upper limit to the number of users supported?

Wittman: We do not have a hard number on the upper limit of users that Stash Data Center can support. We have tested up to 4 nodes and when we measure the scalability we look at the overall throughput a cluster can handle. The number of users that can be supported will be a function not only of the number of nodes, but also the load placed on the system from automated systems like CI.

InfoQ: Scaling Git is considered a significant technical challenge. Can you touch briefly on how you achieved this, for example, did you have to make changes to the way Git operates by default?

Wittman: Scaling out over multiple machines adds CPU, Memory, and local disk caches that help with the resource usage of especially the Git hosting operations. Our SCM cache also benefits from additional fast local disks on the cluster nodes, other than that we did not make changes to Git itself.

  • An in-depth look into the underlying Git concepts, resulting challenges, and mitigation has been provided by Stefan Saasen (Architect Atlassian Stash) in his presentation Scaling Git at Atlassian Summit 2014.

InfoQ: Stash is usually used jointly with other Atlassian tools like JIRA and Bamboo, due to the excellent workflow integration. Do you also have customers only using Stash in isolation?

Wittman: While we do have some customers that only have Stash and take advantage of Stash's granular permissions, most Stash customers take advantage of both the granular code access controls as well as the integrated workflows with JIRA and Bamboo.

InfoQ: Your colleague Tim Pettersen has recently explained the "better pull request model" employed by Stash and Bitbucket, despite the additional resources required to implement this more complex algorithm. Can you summarize why you favor this approach?

Wittman: The pull request algorithm employed by Stash and Bitbucket has two major advantages over that employed by other Git solutions:

  1. Merge conflicts can be displayed in the pull request, allowing developers whose code is conflicting to discuss how the conflicts should be resolved.
  2. Reviewers can see how the changes on a feature branch will affect the master branch, giving a better picture of the code that will actually ship in the product and ultimately reducing the number of defects shipped to customers.

InfoQ: Atlassian has also released Stash on Docker, so far constrained to evaluation purposes. Does this cover clustered deployments too and are you considering this to be a supported product deployment option down the road?

Wittman: The Docker deployment option currently does not cover clustered deployments as the main use case is evaluation by developers. Depending on Docker maturing as a platform and customer demand, we will evaluate making the Docker image a supported deployment option for production in the future.

Stash 3.8, recently released, further improves on operational aspects by introducing completely headless provisioning and JMX performance counters for "the number of projects and repositories, the number of Git pushes and pulls, and various thread pool metrics".

The Stash Data Center documentation provides more details, including sections regarding failover, performance, scalability, and a FAQ. The Stash user documentation applies as well, and the developer documentation covers extending Stash through plugins or the remote REST APIs. Regular Stash support resources are offered via the Atlassian support portal. Dedicated Enterprise services and support programs are also available.

Rate this Article