AZ-400: Designing and Implementing Microsoft DevOps Solutions

Configuring and Managing Repositories

Working with large repositories

As projects evolve, repositories often accumulate extensive commit histories and hefty binary assets. These factors can slow down everyday Git operations like clone, fetch, and checkout. This guide explores strategies and tools to keep your workflow snappy—even in very large repositories.

Challenges in Large Repositories

Long histories, numerous branches, and large binaries contribute to degraded performance. Cloning or fetching a repo with hundreds of megabytes (or gigabytes) of data can take minutes or even hours.

The image illustrates challenges in managing large code repositories, highlighting extensive commit histories and the presence of large binary files.

Optimize Clone Time with a Shallow Clone

A shallow clone downloads only the most recent commits and omits deep history:

git clone --depth <number-of-commits> <repository-url>

Replace <number-of-commits> with how many commits you need (e.g., 1 for just the latest). This approach can reduce clone time and disk usage dramatically.

The image illustrates a concept of optimizing history in large repositories, showing folders labeled "Code" and "Clone" connected to a large repository icon. It suggests using shallow cloning to speed up cloning time.

Note

Shallow clones are ideal for read-only CI jobs or quick code reviews, but avoid them if you need full history for debugging or releases.

Managing Large Files

Storing large binaries in Git can bloat your .git directory. Two popular tools help:

Git Large File Storage (LFS)

Git LFS moves big assets—audio, video, datasets—to a separate server while keeping lightweight pointers in your repo:

git lfs install

Then track files by pattern:

git lfs track "*.psd"
git add .gitattributes

The image illustrates Git Large File Storage (LFS), showing how it replaces large files like audio, video, and graphics with text pointers in Git, while storing the actual file contents on a remote server.

Git-Fat

An alternative is Git-Fat, which also offloads large blobs and keeps pointer files in your Git history:

git fat init
git fat track "*.zip"

The image is an infographic about "Git-Fat," a tool for handling large files in Git repositories, showing how it manages audio, video, and graphics files to reduce repository size.

Cross-Repository Sharing

To avoid duplicate code, share common libraries or components across projects. You can use Git submodules, subtrees, or a package manager.

MethodDescriptionExample Command
SubmoduleEmbed another repo at a fixed pathgit submodule add <repo-url> path/to/module
SubtreeMerge external repo into a subdirectorygit subtree add --prefix=lib/my-lib <repo> main
PackagePublish and consume via npm, Maven, or NuGetnpm install @myorg/my-lib

The image illustrates cross-repository sharing, showing two repositories (A and B) connected by shared code and components to reduce duplication and improve maintainability.

Sparse Checkout for Partial Working Copies

If you only need a subset of files, sparse checkout lets you clone the full Git history but only check out selected paths:

git clone <repository-url>
cd <repo-directory>
git sparse-checkout init --cone
git sparse-checkout set path/to/folder another/path

This keeps your working tree lean by populating only the directories you specify.

The image illustrates the concept of sparse-checkout, showing a diagram with "Repository A" and "Repository B" connected to "Large Repositories," highlighting the ability to check out only a subset of a repository.

Partial Clone to Defer Large Object Downloads

A partial clone avoids downloading all blobs up front and fetches objects on demand:

git clone --filter=blob:none <repository-url>
cd <repo-directory>
git sparse-checkout init --cone       # optional, to combine with sparse checkout

Git will automatically retrieve missing objects the first time you access them.

The image is a flowchart illustrating the process of a partial clone in Git, showing steps to initialize a repository, enable partial clone, and download necessary or all Git objects based on the decision.

Background Prefetch to Keep Objects Up to Date

Enable background prefetching so Git periodically pulls object data from remotes—reducing wait times during normal fetch operations:

git config --global fetch.writeCommitGraph true
git config --global fetch.showForcedUpdates true
git config --global remote.origin.promisor true
git config --global core.commitGraph true

With these settings, Git maintains an up-to-date local object database that accelerates subsequent fetches.

The image is an infographic titled "Background Prefetch," illustrating a process for downloading Git object data from large repositories every hour to reduce time for foreground Git fetch calls.

Watch Video

Watch video content

Previous
Introduction to managing repositories