Git excels at managing code, but projects aren’t always just code. Sometimes you depend on other repositories. Sometimes you need to store massive assets: datasets, images, binaries. Git wasn’t designed for gigabytes of video files or recursive dependency trees. That’s where submodules and Git LFS (Large File Storage) come in. This chapter explores how to use them wisely without turning your repo into chaos.


Submodules: Repos Inside Repos

A submodule is a reference to another Git repository embedded inside your main repo. Instead of copying code, you link it.

git submodule add https://github.com/example/library.git libs/library

This creates a libs/library folder linked to the external repo.

Visualization:

flowchart TD
  MainRepo[Main Repo] --> Sub1[Submodule: library repo]

The submodule isn’t fully cloned code—it’s a pointer to a specific commit in another repo.


Initializing and Updating Submodules

After cloning a repo with submodules, you need to initialize them:

git submodule init
git submodule update

Or in one step:

git submodule update --init --recursive

Pros and Cons of Submodules

Pros:

  • Keep external projects separate but linked.
  • Ensure consistent versions across machines.
  • Useful for vendor libraries.

Cons:

  • Extra commands (update, init).
  • Easy to forget to update submodules.
  • Adds complexity for new team members.

Rule of thumb: use sparingly, only when you really need them.


Solo Workflow Example with Submodules

You’re building a game engine that depends on a physics library. Instead of copy-pasting, you add it as a submodule. You can lock to a specific commit, ensuring stability.

git submodule add https://github.com/example/physics.git libs/physics
git commit -m "chore: add physics engine as submodule"

Team Workflow Example with Submodules

On a team, submodules let everyone use the same version of a dependency. But discipline is required: when updating a submodule, commit the new reference so teammates stay in sync.

cd libs/physics
git checkout v2.1.0
cd ../..
git add libs/physics
git commit -m "chore: bump physics engine to v2.1.0"

Git LFS: Large File Storage

Git LFS solves the problem of big files. Instead of bloating your repo with large binaries, LFS replaces them with lightweight pointers. The actual file lives on a special LFS server.

Install LFS:

git lfs install

Track specific file types:

git lfs track "*.mp4"
git add .gitattributes

Now .mp4 files are stored via LFS.

Visualization:

flowchart TD
  Repo[Git Repo] --> Pointer[Pointer file tracked by Git]
  Pointer --> LFS[Actual binary on LFS server]

Pros and Cons of Git LFS

Pros:

  • Handles large files gracefully.
  • Keeps repo size manageable.
  • Integrates with GitHub, GitLab.

Cons:

  • Requires LFS server support.
  • Adds setup overhead.
  • Storage/bandwidth quotas may apply.

Solo Workflow Example with LFS

As a solo developer working with ML datasets, you track .csv and .h5 files with LFS:

git lfs track "*.csv"
git lfs track "*.h5"
git add .gitattributes
git commit -m "chore: enable LFS for datasets"

Your repo stays lean while datasets live in LFS.


Team Workflow Example with LFS

On a team, LFS ensures binary assets like design files or test videos don’t explode repo size. Everyone pulls lightweight pointers, fetching real binaries on demand.

Visualization of team LFS flow:

flowchart LR
  DevA[Dev A commits video.mp4] --> Repo[Repo stores pointer]
  Repo --> LFS[Git LFS server stores binary]
  DevB[Dev B pulls repo] --> Repo
  Repo --> LFS

Alternatives and Caveats

  • For dependencies: consider package managers instead of submodules.
  • For large files: consider artifact stores (S3, GCS) if LFS quotas are restrictive.

Submodules and LFS are powerful, but not silver bullets.


Think Different Mindset

Submodules and LFS remind us that not everything belongs directly in history. Sometimes we link, sometimes we offload. Git teaches restraint: store text, reference the rest. Think of submodules as citations in a paper, and LFS as archives in a library. You don’t rewrite them—you reference them.


Submodules let you nest repos, LFS lets you tame big files. Both extend Git beyond code. Use them sparingly, wisely, and with discipline. In the next chapter, we’ll explore Git hooks and automation—how to make Git itself an active teammate.