AZ-400: Designing and Implementing Microsoft DevOps Solutions

Configuring and Managing Repositories

Purge Data from Source Control

In this article, we explore a critical aspect of repository management: purging data from source control. Purging involves removing unnecessary, large, or sensitive files from your repository to maintain a clean, efficient, and secure codebase. This process not only optimizes repository performance but also ensures that committed sensitive information, such as passwords or API keys, is completely expunged from your version history.

Best Practice

Always verify your changes in a separate branch before permanently rewriting your repository history.

Why Purge Files?

Purging files from your repository can offer several benefits:

  • Improved Performance: Reduces repository size and speeds up operations.
  • Accidental Commits: Removes large files committed by mistake.
  • Security: Eliminates files containing sensitive data like passwords or API keys.

The image lists three reasons for purging files: shrinking repository size for performance, eliminating mistakenly committed large files, and removing files with sensitive information like passwords or API keys.

Repository Cleanup Tools

There are two primary tools for repository cleanup:

  1. Git Filter Repo – A versatile tool that allows for selective rewriting of repository history.
  2. BFG Repo Cleaner – A faster solution for cleaning up data, which is optimal for simpler use cases.

The image lists two tools for repository cleanup: "Git filter-repo" and "BFG Repo-Cleaner," each with a brief description of their functions.

Practical Examples

Below are some practical examples on how to remove large, unnecessary, or sensitive files from your repository.

Removing Large or Unnecessary Files

To delete large or unnecessary files and reduce repository size, use one of the following commands:

bfg --delete-files yourfile.ext

or

git filter-repo --path archive.tar.gz --invert-paths

Removing Sensitive Content

If sensitive content (such as API keys or credentials) has been accidentally committed, follow these commands for a secure cleanup:

bfg --delete-files yourfile.ext

or

git filter-repo --path archive.tar.gz --invert-paths

To remove sensitive text from your repository history, run:

bfg --replace-text passwords.txt

or

git filter-repo --replace-text passwords.txt

These examples highlight the flexibility and power of both cleanup tools.

Final Steps After Purging

After purging the unwanted data, complete these final steps to ensure consistency and integrity across your development environment:

  1. Force Push the Changes:
    Update your remote repository with the cleaned history by running:

    git push --force
    
  2. Notify Your Collaborators:
    Inform your team to reclone the repository so that everyone’s local copy reflects the revised history.

Important Reminder

After purging data and force pushing the changes, it is crucial to communicate with your team immediately to prevent potential conflicts or outdated references.

Watch Video

Watch video content

Previous
Recovering Data From Source Control Using Azure Repos