BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Git 2.25 Improves Support for Sparse Checkout

Git 2.25 Improves Support for Sparse Checkout

This item in japanese

Bookmarks

Git maintainer Junio C Hamano announced the latest release of Git, version 2.25, including over 500 changes since 2.24. Most notably, Git 2.25 adds a new command to manage sparse checkouts, mostly useful with huge or monolithic repositories.

Sparse checkouts are one of several approaches Git supports to improve performance when working with big repositories. Specifically, sparse checkouts are useful to keep your working directory clean by specifying which directories to keep. This is useful, for example, with repositories containing thousands of directories.

Sparse checkouts have long been available for Git users, although mostly hidden behind configuration files. In Git 2.25, instead, sparse checkouts get their own sparse-checkout command, including init, list, set, enable, and disable subcommands.

Previously sparse checkouts were only of limited usefulness, since while it is true they uncluttered the working directory, they still required full cloning, which can become quite expensive in terms of download time and disk space. As it happens, though, Git 2.24 introduced partial clone support, which sparse checkouts complement perfectly.

The "Partial Clone" feature is a performance optimization for Git that allows Git to function without having a complete copy of the repository. The goal of this work is to allow Git better handle extremely large repositories.

The issue that partial cloning attempts to address is avoiding the toll exacted by having to download objects that are of no interest to the user. This includes, for example, files and directories the user is not going to use, as well previous versions of binary assets that are not referenced anywhere. Partial clones do not operate at DAG-level like shallow clones, or single branch cloning do. They instead work by specifying a filter that limits which objects are fetched. For example,

git clone --filter=blob:none <repo>    # omits all blobs
git clone --filter=blob:limit=Nm       # omits blobs larger then N MB

Partial clones also provide a mechanism that enables missing objects to be retrieved when the user accesses them. In this sense, a partial clone is likely to grow larger and larger over time. This is where sparse checkouts may come into play by enabling to explicitly set which files the user is interested in. For example, to get just a single directory along with its ancestors within a repo, and ignore all the rest, you can execute:

git clone --filter=blob:none <repo>
cd <repo>
git sparse-checkout init
git sparse-checkout set <path-to-directory>

The git sparse-checkout set supports the specification of a generic file pattern, in the same way as .gitignore does. This has a downside: when you have many such patterns, it may become computationally expensive to Git to determine which paths are to be excluded. This is partially solved by enabling Git "cone" mode with git config core.sparseCheckoutCone. This will restrict allowed patterns in sparse checkouts. Sparse checkouts are highly experimental and not yet supported by the major Git-in-the-Cloud providers. To get a detailed discussion of partial cloning and sparse checkout, check this introduction on GitHub blog.

As mentioned, Git 2.25 brings many changes, including performance improvements, UI and workflow improvements, and bug fixes. This goes well beyond what can be covered in a short post, so if you are interested in the full detail, do not miss the official release announcement.

Rate this Article

Adoption
Style

BT