Explains collaboration in SageMaker Studio, comparing EBS/EFS storage limits and advocating using Git inside JupyterLab for versioning, branching, reviews, and reproducible ML workflows.
In this article we cover collaboration patterns inside SageMaker Studio with a focus on managing multiple copies of notebooks and development assets using Git from within the JupyterLab interface. You’ll learn where notebooks are stored, the limits of filesystem-based sharing, and why using Git inside JupyterLab provides a scalable, auditable collaboration workflow for .ipynb, .py and other code assets.
A JupyterLab space in SageMaker Studio runs on a managed compute “app” (the JupyterLab server). Depending on the Studio configuration the app’s filesystem may be:
Backed by an EBS volume attached to the compute instance, or
Mounted from a shared EFS filesystem.
When using an EBS-backed app, saving a notebook writes to that attached EBS volume; when using an EFS-backed Studio, files are persisted on EFS. Neither EBS nor EFS provides line-level version control, branching, or an integrated code review workflow — those features come from Git.Important characteristics and limitations of EBS:
EBS volumes are replicated across underlying hardware but are scoped to a single Availability Zone (AZ).
An AZ is a failure domain within a region; EBS volumes do not cross AZ boundaries.
EBS is not a version-control system. Saving files to EBS alone does not provide shareable commit history or collaboration primitives.
Many SageMaker Studio deployments use EFS to provide a persistent home directory shared across app sessions. EFS can simplify sharing a home directory across multiple app sessions and supports snapshots, but it is not a replacement for Git-based versioning.Key points about EFS in Studio:
EFS provides shared storage that multiple app sessions can mount.
Snapshots are not the same as granular commit history: snapshots are coarse-grained and not suited to code review workflows.
EFS requires NFS mounts and possibly additional configuration in custom deployments.
JupyterLab is extensible, and SageMaker Studio includes the JupyterLab Git extension. Using Git from inside JupyterLab gives you:
Line-level change tracking (who changed what and when).
The ability to create commits and push/pull to a central Git server (GitHub, GitLab, Bitbucket, or self-hosted).
Support for branching, pull requests (code review), and CI/CD integrations.
Offline local commits that can be pushed when connectivity is available.
Git is a distributed version control system; teams typically maintain a central remote repository and developers clone repositories into their workspaces, commit locally, and push to the central server.
Git — the version control system and the command-line tool.
Git servers — hosting providers that implement Git over HTTP/SSH and add collaboration features (e.g., GitHub, GitLab, Bitbucket, or private servers).
JupyterLab Git extension — the UI layer inside JupyterLab/Studio that lets you clone, stage, commit, push, and pull without leaving the notebook environment.
jupytext: pairs notebooks with plain-text representations (e.g., .py), making diffs much cleaner — https://jupytext.readthedocs.io
Git enables reproducibility: by tagging commits or checking out specific commit SHAs, you can reproduce experiments with the exact code used for training or evaluation.
For large datasets, avoid committing binaries to Git. Store bulk data in Amazon S3 (https://aws.amazon.com/s3/) or use a data-versioning system such as DVC (https://dvc.org) and keep pointers/metadata in Git.
Avoid adding large datasets directly into Git repositories. For large input data, use S3 or a data-versioning system (for example, DVC) and keep pointers or metadata in Git.
SageMaker Studio’s JupyterLab is extensible and includes a Git extension to clone, initialize, commit, and synchronize repositories from the UI.
Studio workspaces use attached storage (EBS or EFS) depending on configuration; these filesystems are not substitutes for Git-based version control.
Git is the appropriate tool for line-level versioning of code and JSON-based notebooks; central Git servers add hosting, access control, and collaboration features.
Combine Git with S3 or data-versioning tools for large datasets, and integrate commits with CI/CD and SageMaker Pipelines for reproducible automation.
Note: This article focuses on collaboration inside SageMaker Studio. The SageMaker code editor is a separate topic. For further reading, see: