Version control systems excel at tracking text files. Developers commit code, review diffs, and merge changes seamlessly. But introduce large binary filesâmachine learning model weights, video assets, compiled binariesâand Git grinds to a halt. Repositories balloon to gigabytes. Clones take hours. Simple operations timeout. Traditional Git stores every version of every file in the repository history. A 100MB file modified ten times consumes 1GB of repository space. Every developer cloning the repository downloads all versions, even if they only need the latest. The distributed nature that makes Git powerful for code becomes a liability for large files. Git Large File Storage (LFS) addresses this problem by replacing large files with small pointer files in the repository. The actual file content lives on a separate server. Developers download only the versions they need. The repository stays small and fast. This approach sounds ideal, but Git LFS introduces complexity, infrastructure requirements, and new failure modes. Understanding when LFS adds valueâand when simpler approaches sufficeâdetermines whether it solves problems or creates them. This article explores the technical challenges of large files in Git, examines how Git LFS works, provides guidance on when to adopt it, and offers alternatives for different scenarios.
The Large File Problem
Gitâs architecture creates fundamental issues with large binary files.
How Git Stores Files
Gitâs storage model optimizes for text:
Object Storage
- Every file version is a blob object
- Stored in
.git/objects/directory - Compressed but complete copies
- Delta compression for similar files Repository Growth
- Each commit adds new blobs
- History contains all versions
- Clone downloads entire history
- No way to fetch partial history Text vs Binary
- Text: Delta compression works well
- Binary: Compression often ineffective
- Small text changes: Small deltas
- Small binary changes: Full new copy When you commit a 10KB source file, Git stores it efficiently. Modify one line, and Git stores only the difference. But binary files rarely compress well. A 500MB machine learning model modified slightly still requires storing another 500MB.
Real-World Impact
Large files create concrete problems:
Scenario: ML Model Training Data science team commits model weights after each training run:
- Initial model: 500MB
- After 20 training iterations: 20 versions
- Repository size: 10GB
- Clone time: 45 minutes on fast connection Impact:
- New team members wait hours to start
- CI/CD pipelines timeout
- Git operations become slow
- Developers avoid pulling updates Cost:
- Lost productivity: 2 hours per developer per week
- Infrastructure: Larger storage, more bandwidth
- Frustration: âGit is brokenâ
Scenario: Game Asset Development Game studio tracks 3D models and textures in Git:
- 100 high-resolution textures: 50MB each
- 50 3D models: 20MB each
- 6 months of history
- Repository size: 15GB Impact:
- Remote developers on slow connections canât work
- Push/pull operations take 30+ minutes
- Merge conflicts in binary files unresolvable
- Team considers abandoning Git Cost:
- Remote work becomes impossible
- Collaboration breaks down
- Version control benefits lost
Scenario: Video Production Video team commits raw footage for version control:
- 4K video clips: 1GB per minute
- 100 clips over project lifetime
- Multiple versions per clip
- Repository size: 500GB Impact:
- GitHub/GitLab storage limits exceeded
- Self-hosted servers need expensive storage
- Backups become expensive and slow
- Repository becomes unmaintainable Cost:
- Storage: $500/month for cloud hosting
- Backup: $200/month
- Developer time: 10 hours/month managing issues
- Total: $1,200/month for one repository
Git LFS Architecture
Git LFS replaces large files with pointers while storing actual content separately.
How LFS Works
The core mechanism is pointer substitution:
Pointer File version https://git-lfs.github.com/spec/v1 oid sha256
size 133742 What Happens- Developer commits large file
- LFS uploads file to LFS server
- Git stores small pointer file (130 bytes)
- Repository stays small On Checkout
- Git checks out pointer file
- LFS detects pointer
- LFS downloads actual file from server
- Replaces pointer with real file Benefits
- Repository contains only pointers
- Clone downloads only current version
- History stays lightweight
- Large files stored efficiently The pointer file is tinyâaround 130 bytes regardless of actual file size. A 5GB model weight becomes a 130-byte pointer in Git history. The repository stays fast.
LFS Server Architecture
LFS requires additional infrastructure:
Components
- Git repository: Stores pointers
- LFS server: Manages large files
- Object storage: Stores actual content
- Authentication: Controls access Hosting Options
- GitHub: 1GB free, paid plans available
- GitLab: 10GB free per repository
- Bitbucket: 1GB free, paid plans
- Self-hosted: Full control, more complexity Requirements
- Separate storage from Git
- Network bandwidth for uploads/downloads
- Authentication integration
- Backup strategy Unlike regular Git, which is fully distributed, LFS introduces a centralized component. The LFS server becomes a critical dependency. If itâs down, developers canât access large files.
When to Use Git LFS
LFS solves specific problems but isnât always the right choice.
Good LFS Candidates
LFS works well for certain file types:
Game studios and design teams work with binary assets that change frequently during development. A 3D character model might go through dozens of iterations as artists refine proportions, textures, and animations. Design files for marketing materials evolve as stakeholders provide feedback. Audio clips get adjusted for timing and mixing. These files are too large for regular Gitâa high-resolution texture might be 50MB, a character model 30MB, a Photoshop composition 100MB. Without LFS, the repository would balloon to gigabytes after a few weeks of development. But these assets need version control. Artists need to roll back changes, compare versions, and collaborate without overwriting each otherâs work. LFS solves this perfectly. The repository stays smallâunder 100MB even with hundreds of assets. Artists commit directly without worrying about repository size. Version history is preserved. When conflicts occur, theyâre visible in the Git workflow. The team gets all the benefits of version control without the performance penalty. Example: Game Development
- Character models: 50MB each
- Texture files: 20MB each
- Audio clips: 10MB each
- Total: 500 files, 15GB of assets
- Repository size with LFS: 80MB
Data scientists training models need to track experiments. A model trained with different hyperparameters produces different weights. Comparing these versions requires keeping multiple checkpoints. Without version control, teams resort to manual naming schemesâmodel_v1.bin, model_v2_final.bin, model_v2_final_actually_final.binâthat quickly become unmaintainable. Model weights typically range from 100MB to 4GB. These files are too large for regular Git but perfect for LFS. The key benefit is linking code to models. When you check out a specific commit, you get both the training code and the model weights it produced. This enables true reproducibilityâyou can verify that a particular model came from specific code and hyperparameters. LFS works well for models up to about 4GBâthe size limit chosen to fit most file systems. Beyond that, specialized tools like DVC or Weights & Biases provide better workflows. But for small to medium models, LFS offers the simplest path to version control. Example: Deep Learning Project
- Model checkpoints: 200MB - 4GB each
- 10 experiments, 5 checkpoints each
- Total: 50 files, 10GB
- Repository size with LFS: 120MB
- Benefit: Code and models stay synchronized
Technical documentation often includes binary assetsâvideo tutorials, architecture diagrams in proprietary formats, PDF exports. These assets should version alongside the code they document. When code changes, documentation updates. Keeping them in sync prevents the common problem of outdated documentation. Documentation assets change less frequently than code, making them ideal for LFS. A video tutorial might be recorded once and updated quarterly. Architecture diagrams evolve with major releases. The moderate file sizesâtypically 10MB to 200MBâand infrequent updates mean LFS storage costs stay low. The alternative is storing documentation separately, but this breaks the connection between code and docs. With LFS, checking out a release tag gives you both the code and the documentation that describes it. Writers can commit directly to the repository. The team maintains a single source of truth. Example: Product Documentation
- Video tutorials: 100MB each
- Diagram sources (Visio, Sketch): 10MB each
- PDF exports: 5MB each
- Total: 50 files, 2GB
- Repository size with LFS: 90MB
When LFS Is Wrong
Many scenarios donât benefit from LFS:
LFS doesnât support partial file downloads. When you check out a file, you download the entire thing. This makes LFS impractical for files larger than about 4GBâthe size limit chosen to fit most file systems including FAT32. A 50GB raw video file takes hours to download on typical connections. A 100GB dataset is simply too large for the LFS workflow. Even if your network can handle it, the LFS server storage costs become prohibitive. Ten versions of a 50GB file consume 500GB of LFS storage. For these files, external storage with references works better. Store the file in S3 or similar object storage. Commit a small metadata file to Git with the storage location and checksum. Download the large file only when needed. This approach supports any file size, enables partial downloads, and costs less at scale. Example: Video Production
- 4K raw footage: 50GB per file
- 20 clips over project lifetime
- Total: 1TB
- LFS cost: Prohibitive
- Better: S3 with manifest in Git
Compiled binaries, packaged applications, and other build outputs shouldnât be in version control at all. These are generated filesâoutputs of the build process, not source inputs. Version control is for sources. Committing build artifacts creates problems. The repository grows with every build. Developers download artifacts they donât need. The history fills with noise. When you need a specific build, you canât tell which source code produced it. Artifact repositories like Artifactory or Nexus solve this properly. They store build outputs with metadata linking them to source commits. You can retrieve any build and trace it back to exact source code. Storage is optimized for binaries. Old artifacts can be automatically cleaned up. This is the right tool for the job. Example: Application Releases
- Compiled binary: 200MB
- Daily builds: 365 per year
- Total: 73GB per year
- Wrong: LFS or Git
- Right: Artifactory with Git tags
LFS storage grows with every version. A 1GB file modified daily creates 365GB of LFS storage per year. Database dumps, log files, and cache files that change frequently become expensive to store and provide little value.
These files donât benefit from version control. You rarely need to compare yesterdayâs database dump to todayâs. Log files are better analyzed with log management tools. Cache files are temporary by nature. Tracking their history wastes storage and provides no benefit.
The solution is simple: donât version these files. Add them to .gitignore. Store them locally or in appropriate systemsâdatabases for data, log aggregators for logs, temporary storage for caches. Version control is for files where history matters.
Example: Development Database
- Database dump: 2GB
- Updated daily during development
- 30 days: 60GB of LFS storage
- Value: Minimal (only need latest)
- Better: Local file, regenerate as needed
LFS requires infrastructure beyond Git. Some corporate networks block LFS endpoints. Some Git hosting providers donât support LFS. Self-hosting requires maintaining an LFS server and object storage. Without LFS infrastructure, you canât push or pull large files. The repository becomes unusable for team members who need those files. This infrastructure dependency is a real limitationâunlike regular Git, which is fully distributed, LFS introduces a centralized component that must be available. If LFS infrastructure isnât available or reliable, use alternative approaches. External storage with references works without special infrastructure. Specialized tools like DVC can use any S3-compatible storage. Sometimes the simplest solution is keeping large files out of version control entirely.
Practical Implementation
Using LFS effectively requires understanding its workflow and limitations.
Setting Up Git LFS
Basic setup is straightforward:
Installation
Install LFS:
git lfs install
Track file types:
git lfs track â.psdâ
git lfs track â.binâ
git lfs track âmodels/*.h5â
Commit tracking configuration:
git add .gitattributes
git commit -m âConfigure LFS trackingâ
What Gets Created
.gitattributes file:
*.psd filter=lfs diff=lfs merge=lfs -text
.bin filter=lfs diff=lfs merge=lfs -text
models/.h5 filter=lfs diff=lfs merge=lfs -text
Using LFS
Add and commit as normal:
git add model.bin
git commit -m âAdd trained modelâ
Push sends to both Git and LFS:
git push origin main
Clone automatically fetches LFS files:
git clone https://github.com/user/repo.git
The .gitattributes file tells Git which files to handle with LFS. Once configured, LFS works transparently for most operations.
Common Workflows
Different scenarios require different approaches:
Selective Checkout Clone without downloading LFS files: GIT_LFS_SKIP_SMUDGE=1 git clone repo.git Download specific files later: git lfs pull âinclude=âmodels/production/â Pruning Old Versions Remove old LFS files from local cache: git lfs prune Keep only recent versions: git lfs prune âverify-remote ârecent Migrating Existing Files Convert existing files to LFS: git lfs migrate import âinclude=â.psdâ Rewrite history (careful!): git lfs migrate import âinclude=â*.binâ âeverything Checking LFS Status See which files are tracked: git lfs ls-files Check LFS storage usage: git lfs env
Troubleshooting Common Issues
LFS introduces new failure modes:
âThis exceeds GitHubâs file size limitâ
- Cause: File committed without LFS tracking
- Solution: Configure
.gitattributesbefore committing - Prevention: Set up LFS tracking early âError downloading objectâ
- Cause: LFS server unreachable or file missing
- Solution: Check network, verify LFS server status
- Workaround: Skip LFS files temporarily âEncountered X file(s) that should have been pointersâ
- Cause: Files committed before LFS was configured
- Solution: Use
git lfs migrateto fix history - Prevention: Configure LFS before first commit Slow Clone/Pull
- Cause: Downloading many large LFS files
- Solution: Use
GIT_LFS_SKIP_SMUDGE=1for selective download - Alternative: Fetch only needed files
Alternatives to Git LFS
Many scenarios have better solutions than LFS.
External Storage with References
For truly large files, store references instead:
Architecture
- Store files in S3, GCS, or similar
- Commit metadata and references in Git
- Download files as needed
- Version through object storage Example Structure repo/ âââ models/ â âââ config.yaml # In Git â âââ download.sh # In Git âââ data/ âââ manifest.json # In Git âââ fetch_data.py # In Git Benefits
- No LFS infrastructure needed
- Supports any file size
- Flexible storage options
- Lower costs at scale
- Partial downloads possible This approach works well for datasets, large models, and video files. The repository stays small and fast. Storage costs are lower. Teams have more flexibility.
Specialized Tools
Different domains have purpose-built solutions:
Machine Learning
- DVC (Data Version Control): Git-like for data/models
- Weights & Biases: Experiment tracking
- MLflow: Model registry
- Hugging Face: Model hosting Game Development
- Perforce: Designed for large binary files
- Plastic SCM: Handles large assets well
- Unity Collaborate: Built for Unity projects Media Production
- Frame.io: Video collaboration
- Dropbox: Simple file sync
- Resilio Sync: P2P file sync Build Artifacts
- Artifactory: Universal artifact repository
- Nexus: Maven/npm/Docker registry
- Docker Hub: Container images These tools solve specific problems better than general-purpose version control. They understand domain requirements and optimize accordingly.
Conclusion
Git LFS solves real problems for teams working with binary assets that need version control. It keeps repositories fast while preserving history for files that would otherwise bloat Git. But LFS isnât a universal solution. It requires infrastructure, adds complexity, and has size limitations. For files larger than 4GB, specialized storage with metadata references works better. For build artifacts, dedicated artifact repositories are more appropriate. For massive datasets, tools like DVC provide better workflows. The key is matching the tool to the problem. Use LFS for binary assets in active developmentâ3D models, design files, small ML models. Use external storage for large static files. Use specialized tools for domain-specific needs. Use nothing for files that donât need version control. Git LFS is powerful when applied correctly. Understanding its strengths and limitations ensures it solves problems rather than creating them.
Use Git LFS when:
- Binary files need version history
- Files are 10MB - 2GB
- Team collaborates on assets
- LFS infrastructure available Use external storage when:
- Files exceed 4GB
- Donât need detailed history
- Partial downloads needed
- Very frequent updates Use specialized tools when:
- Domain-specific requirements
- Advanced features needed
- Team already uses them
- Better workflow fit Use nothing when:
- Files are generated artifacts
- Temporary or cache files
- Can be recreated easily
- No collaboration needed