Lesson 7 of 51 ~20 min
Course progress
0%

The Anatomy of Git — Commits, Blobs, Trees, SHA-1, HEAD, and the Staging Area

Explore the core building blocks of Git: commits, blobs, trees, SHA-1 hashes, HEAD, and the staging area. Learn how these components work together to power Git's unique version control model.

Git can feel like magic when you first meet it. You type git add, git commit, and something mysterious happens: files freeze in amber, history grows another node, and your repository evolves. But beneath that magic lies a remarkably elegant data structure. To truly master Git, you need to peek under the hood. This chapter is about anatomy: the organs, bones, and nerves that make Git the organism it is.

Where Subversion has a linear history of diffs and CVS feels like a logbook, Git is a database of objects. Understanding those objects—blobs, trees, commits—transforms Git from a black box into a glass box. You stop fearing what happens when you type a command, because you know what’s actually being stored and linked.


Commits as Historical Anchors

A commit is more than just “I saved my work.” It is a node in a graph that captures the state of the project. Each commit stores:

  • A pointer to a tree (snapshot of directory state).
  • References to one or more parent commits.
  • Metadata (author, committer, timestamp).
  • A commit message (your explanation).

Commits are identified by SHA-1 hashes. These are cryptographic fingerprints generated from the content. If any detail changes—message, tree, parents—the hash changes. This immutability is why Git history is trustworthy.

git cat-file -p HEAD

Output might look like:

tree 6dcb09b5b57875f334f61aebed695e2e4193db5e
parent 553c09f774a0853c56e8dd1b0b9b3e6e5c1f0a09
author Jakub J <jakub@example.com> 1694331612 +0200
committer Jakub J <jakub@example.com> 1694331612 +0200

feat: add login feature

That is the anatomy of a commit laid bare. It is a graph node pointing to a tree, linked back to history.


Blobs: The DNA of Files

Blobs are the simplest Git object. They are just file contents—raw text or binary. No filenames, no directories, no timestamps. Pure data. Git assigns each blob a SHA-1 hash. Two identical files, even across different repos, will have the same blob hash.

Visualization:

flowchart TD
  File1[Blob: hello.txt] --> Hash1[SHA-1: a1b2c3]
  File2[Blob: world.txt] --> Hash2[SHA-1: d4e5f6]
  Hash1 --> Tree[Tree Object]
  Hash2 --> Tree
  Tree --> Commit[Commit Object]

Blobs are deduplicated, so storing the same file twice doesn’t cost extra space. This makes Git efficient.


Trees: The Structure of Directories

Trees glue blobs together into directory hierarchies. A tree is like a folder: it contains references to blobs (files) and other trees (subdirectories). Each entry in a tree stores:

  • File mode (permissions).
  • Filename.
  • SHA-1 of blob or subtree.

Example tree output:

100644 blob aaf4c61d...    main.py
100644 blob bbc1d23f...    utils.py
040000 tree f14c92a1...    src

Visualization:

flowchart TD
  T[Tree] --> B1[Blob: main.py]
  T --> B2[Blob: utils.py]
  T --> Sub[Tree: src]

Trees allow Git to reconstruct full directory structures at any point in history.


SHA-1: The Cryptographic Spine

Every object—commit, tree, blob—is named by its SHA-1 hash. This ensures integrity. If one bit changes, the hash changes. Git can detect corruption instantly. SHA-1 also provides deduplication: identical content yields identical hashes.

Of course, SHA-1 has known collisions in cryptography. But in practice for Git, collisions are astronomically unlikely and detection mechanisms add extra safety.


HEAD: The Consciousness Pointer

HEAD is a symbolic reference pointing to your current branch tip (or directly to a commit in “detached HEAD” state). Think of it as your current “point of view.”

git symbolic-ref HEAD
# refs/heads/main

When you git checkout, you move HEAD. When you commit, Git attaches new history where HEAD points. Detached HEAD means you’re exploring history without being anchored to a branch.

Visualization:

flowchart LR
  A[Commit A] --> B[Commit B]
  B --> C[Commit C]
  C --> D[Commit D]
  D -. HEAD .-> D

HEAD is the compass needle telling Git where “you” are in the DAG.


The Staging Area: Git’s Sketchpad

The staging area (a.k.a. index) is the middleman between working directory and repository. You don’t commit files directly; you stage them. This lets you curate commits, group changes logically, and avoid lumping unrelated edits together.

git add main.py
git status

Status output shows what’s staged vs unstaged. This is Git’s “two-phase commit” model: edit → stage → commit.

Visualization:

flowchart TD
  WD[Working Directory] -->|git add| Index[Staging Area]
  Index -->|git commit| Repo[Repository]

The staging area empowers you to craft meaningful commits instead of dumping everything in one blob.


Putting It All Together

Here’s how it flows:

  1. You edit files in working directory.
  2. git add copies content (blobs) into the staging area.
  3. git commit creates a tree from staged blobs, then a commit pointing to that tree.
  4. Commit links to parent commit(s).
  5. HEAD moves to new commit.

Visualization:

flowchart TD
  WD[Working Dir] -->|add| Index[Staging]
  Index -->|commit| T[Tree]
  T --> B1[Blob: file1]
  T --> B2[Blob: file2]
  C[Commit] -->|points to| T
  HEAD --> C

This is Git’s cycle: edit → stage → commit → repeat.


Solo Workflow Illustration

You edit index.html and style.css. You only want to commit index.html.

  • Stage only that file.
  • Commit.
  • Style changes remain unstaged.
git add index.html
git commit -m "feat: update homepage HTML"

That precision is why staging exists. You can separate changes into meaningful chunks.


Team Workflow Illustration

On a team, staging becomes a way to maintain discipline. Instead of dumping all changes, devs curate commits that reviewers can digest. This makes pull requests cleaner, merges easier, and history understandable.

Visualization of multiple devs staging different subsets:

flowchart LR
  DevA[Dev A WD] -->|stage HTML| IndexA[Index A]
  DevB[Dev B WD] -->|stage JS| IndexB[Index B]
  IndexA --> CommitA[Commit A]
  IndexB --> CommitB[Commit B]

Each developer curates their own history, then merges into shared repo.


Think Different Mindset

Git’s anatomy teaches a powerful lesson: history is content-addressed. Instead of trusting a central ledger, you trust hashes. Instead of scattering diffs, you store snapshots. Instead of dumping everything, you curate commits. The anatomy of Git is not just technical; it’s philosophical. It teaches precision, integrity, and authorship.


Now you understand the beating heart of Git. Commits are graph nodes, blobs are DNA, trees are skeletons, SHA-1 is the fingerprint, HEAD is the consciousness pointer, and the staging area is the sketchpad. Together, they make Git what it is: a distributed history machine. Once you know this anatomy, commands stop being magic spells. They become deliberate manipulations of a living organism you control.