Inside Git: How It Works and the Role of the .git Folder Explained

Inside Git: How It Works and the Role of the .git Folder

Ever wondered what actually happens when you run git add or git commit? This deep-dive explores Git's internal architecture, demystifies the .git folder, and reveals how Git uses blobs, trees, and commits to track your code's history with cryptographic precision.

In my previous post, we learned how to use Git. But have you ever wondered what's actually happening under the hood? What is that mysterious .git folder doing? Why does Git use weird hexadecimal strings everywhere? How does it know exactly what changed in your files?

Today, we're going on a journey inside Git's brain. By the end of this post, you'll understand the elegant simplicity that powers one of the most important tools in software development. Trust me, once you "get" how Git works internally, the commands will make so much more sense.

The Mystery of the .git Folder

Let's start with the elephant in the room - that .git folder that appears when you run git init.

What is the .git Folder?

The .git folder is Git's database. It's where Git stores everything it knows about your project's history - every commit, every file version, every branch, everything.

Here's the beautiful part: everything you need to know about your project's entire history is contained in this one folder. Delete your working files? No problem, Git can restore them from .git. Your entire project's timeline, all branches, all commits - it's all right there.

Think of it like this:

Your working directory (the files you see and edit) = Your desk
The .git folder = A filing cabinet with perfect records of everything

Why Does It Exist?

Without the .git folder, Git would be useless. This folder is the reason Git can:

Remember every change you've ever committed
Show you what files looked like 6 months ago
Track who changed what and when
Merge different people's work together
Create and switch between branches instantly
Restore deleted files

When you run git init, Git creates this folder and says: "I'm ready to track your project now!"

Let's Peek Inside

Run this in any Git repository:

bash

ls -la .git/

You'll see something like:

Loading syntax highlighter...

Don't worry if this looks overwhelming. We'll break down the important parts piece by piece.

Git Objects: The Building Blocks

Here's where it gets fascinating. Git stores everything as objects in the objects/ folder. There are only four types of objects, and understanding them is the key to understanding Git.

The Three Essential Objects

Blob (Binary Large Object)
Tree
Commit

(There's also a fourth type called "tag", but we'll focus on these three)

Let me explain each one with a real-world analogy.

1. Blob: The File Content

A blob is how Git stores your actual file content. Just the content - no filename, no folder location, just the raw data.

Analogy: Think of a blob like a book without a cover. It has all the content (the pages), but no title or author information.

When you save a file in Git, it:

Takes the file content
Compresses it
Generates a unique ID (hash) for it
Stores it in objects/ as a blob

Key insight: Git doesn't store filenames in blobs! It stores them in trees (next section).

Example:

bash

# Let's say you have index.html with this content:
# <h1>Hello World</h1>

# Git creates a blob object containing that exact text
# The blob gets a unique hash like: 557db03de997c86a4a028e1ebd3a1ceb225be238

Mind-blowing fact: If you have the exact same file content in 100 different places in your project, Git only stores it once as a single blob. Git is smart about deduplication!

2. Tree: The Directory Structure

A tree object represents a directory. It contains:

References to blobs (files)
References to other trees (subdirectories)
Filenames
File permissions

Analogy: A tree is like a table of contents. It says: "In this directory, there's a file called index.html that points to this blob, and a subdirectory called css/ that points to another tree."

Structure:

Tree Object for my-project/
├── blob 557db03  index.html
├── blob 8a3f2bc  README.md
└── tree 4d5e6a7  css/
    └── blob 9f8e7d6  style.css

Example breakdown:

bash

# Your project folder:
my-project/
├── index.html       ← Stored as blob 557db03
├── README.md        ← Stored as blob 8a3f2bc
└── css/
    └── style.css    ← Stored as blob 9f8e7d6

# Git creates:
# - 3 blob objects (for the 3 files)
# - 2 tree objects (for my-project/ and css/)

Key insight: Trees give context to blobs. They're what turn "random file content" into "this is index.html in the root directory."

Note: You can run the following command in any of your git tracked projects too see the tree structure

git

git ls-tree HEAD

3. Commit: The Snapshot with Metadata

A commit object is a snapshot of your entire project at a specific moment. It contains:

A pointer to a tree (the root directory of your project)
Pointer(s) to parent commit(s)
Author information
Committer information
Commit message
Timestamp

Analogy: A commit is like a photograph with metadata. The photo shows what your project looked like (the tree), and the metadata tells you when it was taken, who took it, and what the occasion was (commit message).

Structure:

Commit Object a3f8b2c
├── tree: 4d5e6a7              ← Points to root tree
├── parent: 7b9e1f3            ← Points to previous commit
├── author: Your Name
├── committer: Your Name
├── date: 2025-12-30 14:32:01
└── message: "Add homepage and styling"

The Commit Chain:

C1 ← C2 ← C3 ← C4 (HEAD → main)
│    │    │    │
│    │    │    └─ tree_4
│    │    └────── tree_3
│    └─────────── tree_2
└──────────────── tree_1

Each commit points to its parent, creating a linked list of history

Key insight: Commits are immutable (unchangeable). Once created, a commit's hash never changes. This is why Git is so reliable - history can't be accidentally modified.

The Beautiful Relationship: Commits, Trees, and Blobs

Let me show you how these three object types work together with a real example.

Example Project Structure

bash

my-website/
├── index.html          # Contains: <h1>Welcome</h1>
├── about.html          # Contains: <h1>About Us</h1>
└── css/
    └── style.css       # Contains: h1 { color: blue; }

What Git Creates

Loading syntax highlighter...

What this means:

The commit captures the entire state of your project
The trees represent your folder structure
The blobs contain your actual file content
Everything is connected through hash references

The Magic of Hashes

Notice all those weird strings like a3f8b2c and 1a2b3c? These are SHA-1 hashes - unique identifiers that Git generates.

How hashes work:

bash

# Git runs your content through a cryptographic function
Content: "<h1>Welcome</h1>"
↓
SHA-1 hash function
↓
Result: 1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0

Properties of hashes:

Unique: Different content = different hash (99.9999999% guaranteed)
Deterministic: Same content = same hash (always)
Tamper-proof: Change one character = completely different hash
Content-addressable: Git uses the hash as the filename in objects/

Example:

bash

# If two people create files with identical content
Person A: index.html → "Hello World"
Person B: main.html → "Hello World"

# Git creates only ONE blob object because the content is identical
# Both filenames point to the same blob hash

This is why Git is incredibly efficient with storage!

How Git Tracks Changes: The Index (Staging Area)

Remember the staging area from the basics guide? Let's see what it actually is.

The Index File

The index (also called staging area) is a binary file at .git/index. It's a snapshot of your next commit.

Think of it as a draft:

Working Directory: Your current edits (the rough draft)
Index: Changes ready to commit (the final draft)
Repository: Committed history (published versions)

What's Actually in the Index?

The index contains:

List of files in the staging area
Hash of each file's blob
File metadata (permissions, timestamps)
File paths

Example:

bash

# Before git add
Index: (empty or contains previous commit's state)

# After: git add index.html
Index:
├── index.html → blob 1a2b3c (staged, ready to commit)

# After: git add about.html css/style.css
Index:
├── index.html → blob 1a2b3c
├── about.html → blob 4d5e6f
└── css/style.css → blob 9j0k1l

The index is Git's way of asking: "Are you sure you want to commit these specific changes?"

What Happens During git add (Internally)

Let's trace exactly what happens when you run git add index.html.

Step-by-Step Internal Process

bash

$ git add index.html

Git performs these steps:

Loading syntax highlighter...

Key takeaways:

git add creates blob objects immediately (not during commit!)
The blob is stored in .git/objects/
The index now "knows" about this staged change
Your working directory file remains unchanged

Verification (Try This!)

bash

# After git add index.html
# Find the blob object:
git hash-object index.html
# Output: 1a2b3c4d5e6f... (the blob hash)

# Verify Git stored it:
find .git/objects -type f
# You'll see: .git/objects/1a/2b3c4d5e6f...

# Read the blob content back:
git cat-file -p 1a2b3c4d5e6f
# Output: <h1>Welcome to my site</h1>

Mind blown? Git already saved your file content even before you commit!

What Happens During git commit (Internally)

Now let's see what happens when you run git commit -m "Initial commit".

Step-by-Step Internal Process

bash

$ git commit -m "Add homepage"

Git performs these steps:

Loading syntax highlighter...

Result:

New commit object created: a3f8b2c
New tree objects created: 4d5e6a7 and 7g8h9i
Blob objects (already created during git add)
Branch pointer updated
Your working directory and index remain unchanged

The Complete Flow

Let's visualize the entire add → commit process:

Loading syntax highlighter...

How Git Knows What Changed

This is where Git's design really shines. Let me show you how Git detects changes with extreme efficiency.

The Hash-Based Comparison

When you run git status or git diff, Git doesn't compare file contents line by line. Instead, it compares hashes.

Process:

bash

$ git status

What Git does:

Loading syntax highlighter...

Why this is brilliant:

Speed: Comparing two 40-character hashes is instant, even for gigantic files
Accuracy: If hashes match, files are 100% identical
Efficiency: Git doesn't need to read entire files to detect changes

Content-Addressable Storage

Git uses content-addressable storage - meaning content is stored and retrieved based on its hash.

Traditional file systems:

Filename → Location on disk → Content
"index.html" → /path/to/file → <h1>Hello</h1>

Git's approach:

Content → Hash → Storage location
<h1>Hello</h1> → 1a2b3c4d → .git/objects/1a/2b3c4d

Benefits:

Automatic deduplication (same content = same hash = stored once)
Tamper detection (change content = different hash)
Easy verification (recalculate hash to check integrity)

Detecting Changes: Three Comparisons

Git actually performs three different comparisons:

Loading syntax highlighter...

Example:

bash

# Scenario
1. Committed version: "Hello"      (blob: abc123)
2. Staged version: "Hello World"   (blob: def456)
3. Working version: "Hello World!" (blob: ghi789)

# git status output
Changes to be committed:
  modified:   index.html           ← Index ≠ Repository

Changes not staged for commit:
  modified:   index.html           ← Working ≠ Index

Git compares hashes at each level to show you exactly what's different.

Branches: Just Pointers!

Here's something that might blow your mind: branches in Git are just files containing a commit hash.

What is a Branch?

A branch is a lightweight movable pointer to a commit.

Example:

bash

# The main branch is literally just a file:
$ cat .git/refs/heads/main
a3f8b2c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5

# That's it! Just a commit hash.

Creating a branch:

bash

$ git branch feature-login

# Git creates: .git/refs/heads/feature-login
# Content: (same hash as main)
a3f8b2c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5

Branch Visualization

Commit History:
───────────────
C1 ← C2 ← C3 ← C4 ← C5
                     ↑
                     └─ main (points to C5)
                     └─ feature-login (also points to C5)

.git/refs/heads/main:           a3f8b2c (C5's hash)
.git/refs/heads/feature-login:  a3f8b2c (C5's hash)
.git/HEAD:                      ref: refs/heads/main

When you make a commit on feature-login:

C1 ← C2 ← C3 ← C4 ← C5 ← C6
                     ↑    ↑
                     │    └─ feature-login (moved to C6)
                     └────── main (still at C5)

.git/refs/heads/feature-login:  7g8h9i0 (C6's hash) ← Updated!

Why this is amazing:

Creating a branch is instant (just writing a hash to a file)
Branches use almost no disk space
Switching branches just changes what HEAD points to

HEAD: Where You Are

HEAD is a special pointer that tells Git "which commit am I currently on?"

bash

$ cat .git/HEAD
ref: refs/heads/main

# HEAD points to the main branch
# main points to commit a3f8b2c
# So you're on commit a3f8b2c

Detached HEAD (when HEAD points directly to a commit):

bash

$ git checkout a3f8b2c
# HEAD is now at a3f8b2c (detached)

$ cat .git/HEAD
a3f8b2c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5

# HEAD points directly to a commit, not a branch

The Mental Model: Putting It All Together

Let's build a complete mental model of how everything works together.

The Git Universe

Loading syntax highlighter...

The Complete Workflow

Loading syntax highlighter...

Practical Exploration: See It Yourself

Let's do a hands-on exercise to see Git's internals in action.

Exercise: Create and Inspect Git Objects

bash

Loading syntax highlighter...

What you learned:

git add created a blob immediately
git commit created a tree and commit object
All objects are stored in .git/objects/
You can inspect any object with git cat-file -p <hash>

Exercise: See How Deduplication Works

bash

Loading syntax highlighter...

Magic: Git stored the content only once, even though it appears in two different files!

How Git Ensures Integrity

Git uses hashes everywhere, and this provides incredible data integrity.

Tamper Detection

Scenario: Someone tries to modify an old commit

Loading syntax highlighter...

Result: Any modification to a commit changes its hash, which cascades up the chain, making it obvious something was tampered with.

The Merkle Tree Structure

Git's commit history is a Merkle tree (or hash tree):

Loading syntax highlighter...

Why this matters:

Change any blob → tree hash changes
Change tree → commit hash changes
Change commit → child commit hash changes
You can verify the entire history by checking one hash (HEAD)

This is the same cryptographic structure used in blockchain!

Common Misconceptions Clarified

Misconception #1: "Git stores diffs"

Reality: Git stores complete snapshots, not diffs.

When you commit, Git saves the full content of every file (as blobs), not just what changed. However, Git is smart about storage:

Identical content = same blob = stored once
Git can compress and pack objects later to save space
Diffs are calculated on-the-fly when you run git diff or git log -p

Misconception #2: "Branches are copies of code"

Reality: Branches are just 41-byte files containing a commit hash.

Creating a branch doesn't copy any files. It just creates a new pointer. This is why branching in Git is instant and uses negligible space.

Misconception #3: "The staging area is unnecessary"

Reality: The index gives you precise control over commits.

The staging area lets you:

Commit only some of your changes
Review exactly what will be committed
Build commits incrementally
Stage parts of files (with git add -p)

Misconception #4: "Git is slow because it checks every file"

Reality: Git uses timestamps and hashes for lightning-fast comparisons.

Git first checks file metadata (size, modification time). Only if that changed does it hash the content. And even then, comparing hashes is instant.

Visualizing Git's Architecture

Let me give you one final, complete visualization:

Loading syntax highlighter...

Why Understanding This Matters

You might be wondering: "Why do I need to know all this? Can't I just use Git without understanding internals?"

Fair question! Here's why this knowledge is valuable:

1. Debugging Issues

When something goes wrong, you'll know where to look:

Lost commit? Check git reflog (it's in .git/logs/)
Confused about merge? You understand how commits link together
Worried about disk space? You know Git deduplicates blobs

2. Using Advanced Features

Understanding internals makes advanced commands make sense:

git rebase rewrites commits (creates new hashes)
git cherry-pick copies commits (creates new commit with same tree)
git gc repacks objects and cleans up loose objects

3. Building Mental Models

Instead of memorizing commands, you understand why things work:

Why branches are so fast (just pointers!)
Why Git is so reliable (cryptographic hashing)
How Git saves space (content-addressable storage)

4. Confidence

You're not afraid of Git anymore because you know what it's doing. You can experiment, knowing the .git folder has your back.

Key Takeaways

Let's recap the essential concepts:

The Big Ideas

Everything is an object: Blobs (file content), Trees (directories), Commits (snapshots)
Hashes are everywhere: Git uses SHA-1 hashes as unique identifiers and for integrity
The .git folder is Git: Your entire project history lives in this one folder
Commits are snapshots: Git saves complete project states, not just diffs
Branches are pointers: Lightweight, fast, and use almost no disk space
Content-addressable storage: Same content = same hash = stored once
Three-stage workflow: Working Directory → Index → Repository

The Internal Process

git add:
  1. Hash file content
  2. Create blob in objects/
  3. Update index

git commit:
  1. Create tree objects from index
  2. Create commit object pointing to tree
  3. Update branch reference
  4. Update HEAD

The Mental Model

Commit ────▶ Tree ────▶ Blobs
  │           │
  │           └─────▶ Trees (subdirectories)
  │
  └─ Points to parent commit(s)
     └─ Creates history chain

What's Next?

Now that you understand Git's internals, you're ready for more advanced topics:

Remote repositories: How git push and git pull work with Git's object model
Merge strategies: How Git combines different commit histories
Rebase internals: How Git rewrites commit history
Git packfiles: How Git optimizes storage
Reflog: Your safety net for recovering "lost" commits
Garbage collection: How Git cleans up unused objects

Final Thoughts

Git is beautifully elegant once you understand it. Everything revolves around:

Content-addressable storage (hashes)
Immutable objects (commits, trees, blobs)
Lightweight pointers (branches, tags, HEAD)

The .git folder isn't mysterious anymore - it's a well-organized database of objects and references. When you run git add, you're creating blobs. When you run git commit, you're creating trees and commit objects. When you create a branch, you're just writing a hash to a file.

Git's genius is hiding this complexity behind simple commands while giving you the power of a cryptographically secure, distributed version control system.

Next time you run git commit, you'll know exactly what's happening behind the scenes. And that knowledge? That's power.

Now go explore your .git folder with confidence! 🚀

Want to dive deeper? Try these exercises:

Use git cat-file -p to explore every object in your repository
Watch how blob hashes change when you modify file content
Create multiple branches and observe .git/refs/heads/
Check your reflog with git reflog to see your command history

Questions or discoveries? I'd love to hear what you found in the comments!

The Mystery of the .git Folder

Let's start with the elephant in the room - that .git folder that appears when you run git init.

What is the .git Folder?

The .git folder is Git's database. It's where Git stores everything it knows about your project's history - every commit, every file version, every branch, everything.

Think of it like this:

Your working directory (the files you see and edit) = Your desk
The .git folder = A filing cabinet with perfect records of everything

Why Does It Exist?

Without the .git folder, Git would be useless. This folder is the reason Git can:

Remember every change you've ever committed
Show you what files looked like 6 months ago
Track who changed what and when
Merge different people's work together
Create and switch between branches instantly
Restore deleted files

When you run git init, Git creates this folder and says: "I'm ready to track your project now!"

Let's Peek Inside

Run this in any Git repository:

bash

ls -la .git/

You'll see something like:

.git/
├── HEAD              ← Points to your current branch
├── config            ← Repository settings
├── description       ← Repository description
├── index             ← Staging area (more on this later!)
├── hooks/            ← Scripts that run at certain Git events
├── info/             ← Repository info and exclude patterns
├── objects/          ← THE HEART: Where all your data lives
├── refs/             ← Pointers to commits (branches and tags)
│   ├── heads/        ← Your branches live here
│   └── tags/         ← Your tags live here
└── logs/             ← History of where branches have pointed

Loading syntax highlighter...

Don't worry if this looks overwhelming. We'll break down the important parts piece by piece.

Git Objects: The Building Blocks

Here's where it gets fascinating. Git stores everything as objects in the objects/ folder. There are only four types of objects, and understanding them is the key to understanding Git.

The Three Essential Objects

Blob (Binary Large Object)
Tree
Commit

(There's also a fourth type called "tag", but we'll focus on these three)

Let me explain each one with a real-world analogy.

1. Blob: The File Content

A blob is how Git stores your actual file content. Just the content - no filename, no folder location, just the raw data.

Analogy: Think of a blob like a book without a cover. It has all the content (the pages), but no title or author information.

When you save a file in Git, it:

Takes the file content
Compresses it
Generates a unique ID (hash) for it
Stores it in objects/ as a blob

Key insight: Git doesn't store filenames in blobs! It stores them in trees (next section).

Example:

bash

# Let's say you have index.html with this content:
# <h1>Hello World</h1>

# Git creates a blob object containing that exact text
# The blob gets a unique hash like: 557db03de997c86a4a028e1ebd3a1ceb225be238

Mind-blowing fact: If you have the exact same file content in 100 different places in your project, Git only stores it once as a single blob. Git is smart about deduplication!

2. Tree: The Directory Structure

A tree object represents a directory. It contains:

References to blobs (files)
References to other trees (subdirectories)
Filenames
File permissions

Structure:

Tree Object for my-project/
├── blob 557db03  index.html
├── blob 8a3f2bc  README.md
└── tree 4d5e6a7  css/
    └── blob 9f8e7d6  style.css

Example breakdown:

bash

# Your project folder:
my-project/
├── index.html       ← Stored as blob 557db03
├── README.md        ← Stored as blob 8a3f2bc
└── css/
    └── style.css    ← Stored as blob 9f8e7d6

# Git creates:
# - 3 blob objects (for the 3 files)
# - 2 tree objects (for my-project/ and css/)

Key insight: Trees give context to blobs. They're what turn "random file content" into "this is index.html in the root directory."

Note: You can run the following command in any of your git tracked projects too see the tree structure

git

git ls-tree HEAD

3. Commit: The Snapshot with Metadata

A commit object is a snapshot of your entire project at a specific moment. It contains:

A pointer to a tree (the root directory of your project)
Pointer(s) to parent commit(s)
Author information
Committer information
Commit message
Timestamp

Structure:

Commit Object a3f8b2c
├── tree: 4d5e6a7              ← Points to root tree
├── parent: 7b9e1f3            ← Points to previous commit
├── author: Your Name
├── committer: Your Name
├── date: 2025-12-30 14:32:01
└── message: "Add homepage and styling"

The Commit Chain:

C1 ← C2 ← C3 ← C4 (HEAD → main)
│    │    │    │
│    │    │    └─ tree_4
│    │    └────── tree_3
│    └─────────── tree_2
└──────────────── tree_1

Each commit points to its parent, creating a linked list of history

Key insight: Commits are immutable (unchangeable). Once created, a commit's hash never changes. This is why Git is so reliable - history can't be accidentally modified.

The Beautiful Relationship: Commits, Trees, and Blobs

Let me show you how these three object types work together with a real example.

Example Project Structure

bash

my-website/
├── index.html          # Contains: <h1>Welcome</h1>
├── about.html          # Contains: <h1>About Us</h1>
└── css/
    └── style.css       # Contains: h1 { color: blue; }

What Git Creates

┌─────────────────────────────────────────────────────────────┐
│                    COMMIT OBJECT                             │
│  hash: a3f8b2c                                               │
│  message: "Initial website structure"                       │
│  author: You                                                 │
│  tree: → ┌───────────────────────────────────┐             │
└──────────│   TREE (root: my-website/)         │             │
           │   hash: 4d5e6a7                    │             │
           │                                     │             │
           │   ├─ blob 1a2b3c  index.html       │             │
           │   ├─ blob 4d5e6f  about.html       │             │
           │   └─ tree 7g8h9i  css/             │             │
           └──────────────┬────────────────────┬┘             │
                          │                    │               │
                          ▼                    ▼               │
            ┌──────────────────┐    ┌──────────────────┐     │
            │ BLOB             │    │ TREE (css/)      │     │
            │ hash: 1a2b3c     │    │ hash: 7g8h9i     │     │
            │                  │    │                  │     │
            │ <h1>Welcome</h1> │    │ blob 9j0k1l      │     │
            └──────────────────┘    │   style.css      │     │
                                    └─────────┬────────┘     │
            ┌──────────────────┐             │               │
            │ BLOB             │             ▼               │
            │ hash: 4d5e6f     │   ┌──────────────────┐     │
            │                  │   │ BLOB             │     │
            │ <h1>About Us</h1>│   │ hash: 9j0k1l     │     │
            └──────────────────┘   │                  │     │
                                   │ h1{color: blue;} │     │
                                   └──────────────────┘     │

Loading syntax highlighter...

What this means:

The commit captures the entire state of your project
The trees represent your folder structure
The blobs contain your actual file content
Everything is connected through hash references

The Magic of Hashes

Notice all those weird strings like a3f8b2c and 1a2b3c? These are SHA-1 hashes - unique identifiers that Git generates.

How hashes work:

bash

# Git runs your content through a cryptographic function
Content: "<h1>Welcome</h1>"
↓
SHA-1 hash function
↓
Result: 1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0

Properties of hashes:

Unique: Different content = different hash (99.9999999% guaranteed)
Deterministic: Same content = same hash (always)
Tamper-proof: Change one character = completely different hash
Content-addressable: Git uses the hash as the filename in objects/

Example:

bash

# If two people create files with identical content
Person A: index.html → "Hello World"
Person B: main.html → "Hello World"

# Git creates only ONE blob object because the content is identical
# Both filenames point to the same blob hash

This is why Git is incredibly efficient with storage!

How Git Tracks Changes: The Index (Staging Area)

Remember the staging area from the basics guide? Let's see what it actually is.

The Index File

The index (also called staging area) is a binary file at .git/index. It's a snapshot of your next commit.

Think of it as a draft:

Working Directory: Your current edits (the rough draft)
Index: Changes ready to commit (the final draft)
Repository: Committed history (published versions)

What's Actually in the Index?

The index contains:

List of files in the staging area
Hash of each file's blob
File metadata (permissions, timestamps)
File paths

Example:

bash

# Before git add
Index: (empty or contains previous commit's state)

# After: git add index.html
Index:
├── index.html → blob 1a2b3c (staged, ready to commit)

# After: git add about.html css/style.css
Index:
├── index.html → blob 1a2b3c
├── about.html → blob 4d5e6f
└── css/style.css → blob 9j0k1l

The index is Git's way of asking: "Are you sure you want to commit these specific changes?"

What Happens During git add (Internally)

Let's trace exactly what happens when you run git add index.html.

Step-by-Step Internal Process

bash

$ git add index.html

Git performs these steps:

┌──────────────────────────────────────────────────────────┐
│ STEP 1: Read the file                                    │
│ ────────────────────────────────────────────────────────│
│ Git reads index.html from your working directory         │
│ Content: "<h1>Welcome to my site</h1>"                   │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│ STEP 2: Create a blob object                             │
│ ────────────────────────────────────────────────────────│
│ Git compresses the content and generates a hash          │
│ Hash: 1a2b3c4d...                                        │
│                                                           │
│ Git stores this as: .git/objects/1a/2b3c4d...           │
│ (First 2 chars = directory, rest = filename)            │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│ STEP 3: Update the index                                 │
│ ────────────────────────────────────────────────────────│
│ Git updates .git/index with:                             │
│   - Filename: index.html                                 │
│   - Blob hash: 1a2b3c4d...                              │
│   - Metadata: permissions, size, timestamps              │
└──────────────────────────────────────────────────────────┘

Loading syntax highlighter...

Key takeaways:

git add creates blob objects immediately (not during commit!)
The blob is stored in .git/objects/
The index now "knows" about this staged change
Your working directory file remains unchanged

Verification (Try This!)

bash

# After git add index.html
# Find the blob object:
git hash-object index.html
# Output: 1a2b3c4d5e6f... (the blob hash)

# Verify Git stored it:
find .git/objects -type f
# You'll see: .git/objects/1a/2b3c4d5e6f...

# Read the blob content back:
git cat-file -p 1a2b3c4d5e6f
# Output: <h1>Welcome to my site</h1>

Mind blown? Git already saved your file content even before you commit!

What Happens During git commit (Internally)

Now let's see what happens when you run git commit -m "Initial commit".

Step-by-Step Internal Process

bash

$ git commit -m "Add homepage"

Git performs these steps:

┌──────────────────────────────────────────────────────────┐
│ STEP 1: Create tree objects                              │
│ ────────────────────────────────────────────────────────│
│ Git looks at the index and creates tree objects          │
│ representing your directory structure                     │
│                                                           │
│ For subdirectories, Git creates nested trees             │
│                                                           │
│ Root tree (4d5e6a7):                                     │
│   ├─ blob 1a2b3c  index.html                            │
│   ├─ blob 4d5e6f  about.html                            │
│   └─ tree 7g8h9i  css/                                   │
│                                                           │
│ css/ tree (7g8h9i):                                      │
│   └─ blob 9j0k1l  style.css                             │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│ STEP 2: Create commit object                             │
│ ────────────────────────────────────────────────────────│
│ Git creates a commit object containing:                  │
│                                                           │
│ tree 4d5e6a7                    ← Root tree             │
│ parent 7b9e1f3                  ← Previous commit       │
│ author Your Name <you@email>                            │
│ committer Your Name <you@email>                         │
│ timestamp 1735567921                                     │
│                                                           │
│ Add homepage                    ← Your message          │
│                                                           │
│ Hash: a3f8b2c9d...                                      │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│ STEP 3: Update the branch reference                      │
│ ────────────────────────────────────────────────────────│
│ Git updates .git/refs/heads/main to point to new commit │
│                                                           │
│ Before: .git/refs/heads/main → 7b9e1f3 (old commit)    │
│ After:  .git/refs/heads/main → a3f8b2c (new commit)    │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│ STEP 4: Update HEAD                                       │
│ ────────────────────────────────────────────────────────│
│ HEAD points to your current branch:                      │
│                                                           │
│ .git/HEAD → ref: refs/heads/main → a3f8b2c             │
└──────────────────────────────────────────────────────────┘

Loading syntax highlighter...

Result:

New commit object created: a3f8b2c
New tree objects created: 4d5e6a7 and 7g8h9i
Blob objects (already created during git add)
Branch pointer updated
Your working directory and index remain unchanged

The Complete Flow

Let's visualize the entire add → commit process:

┌────────────────────────────────────────────────────────────────┐
│                  COMPLETE GIT ADD → COMMIT FLOW                │
└────────────────────────────────────────────────────────────────┘

INITIAL STATE
─────────────
Working Directory          Index               Repository
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ index.html   │     │   (empty)    │     │  Commit C1   │
│ (modified)   │     │              │     │              │
└──────────────┘     └──────────────┘     └──────────────┘


AFTER: git add index.html
───────────────────────────
Working Directory          Index               Repository
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ index.html   │────▶│ index.html   │     │  Commit C1   │
│              │     │ → blob 1a2b  │     │              │
└──────────────┘     └──────────────┘     └──────────────┘
                            │                     │
                            └─────────────────────┤
                                                  │
                                    .git/objects/ │
                                    ├─ 1a/2b3c... ◀


AFTER: git commit -m "Update homepage"
───────────────────────────────────────
Working Directory          Index               Repository
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ index.html   │     │ index.html   │     │  Commit C1   │
│              │     │ → blob 1a2b  │     │      ↑       │
└──────────────┘     └──────────────┘     │  Commit C2   │◀── HEAD
                                           │  (new!)      │
                                           └──────────────┘
                                                  │
                                    .git/objects/ │
                                    ├─ 1a/2b3c... │ (blob)
                                    ├─ 4d/5e6a... │ (tree)
                                    └─ a3/f8b2... ◀ (commit)

Loading syntax highlighter...

How Git Knows What Changed

This is where Git's design really shines. Let me show you how Git detects changes with extreme efficiency.

The Hash-Based Comparison

When you run git status or git diff, Git doesn't compare file contents line by line. Instead, it compares hashes.

Process:

bash

$ git status

What Git does:

┌──────────────────────────────────────────────────────────┐
│ 1. Get committed version hash                            │
│    Git looks at HEAD commit's tree                       │
│    index.html → blob 1a2b3c                             │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│ 2. Get working directory hash                            │
│    Git hashes current index.html content                 │
│    Result: 4d5e6f                                        │
└──────────────────────────────────────────────────────────┘
                          ↓
┌──────────────────────────────────────────────────────────┐
│ 3. Compare hashes                                         │
│    1a2b3c ≠ 4d5e6f                                       │
│    ↓                                                      │
│    File has changed!                                     │
└──────────────────────────────────────────────────────────┘

Loading syntax highlighter...

Why this is brilliant:

Speed: Comparing two 40-character hashes is instant, even for gigantic files
Accuracy: If hashes match, files are 100% identical
Efficiency: Git doesn't need to read entire files to detect changes

Content-Addressable Storage

Git uses content-addressable storage - meaning content is stored and retrieved based on its hash.

Traditional file systems:

Filename → Location on disk → Content
"index.html" → /path/to/file → <h1>Hello</h1>

Git's approach:

Content → Hash → Storage location
<h1>Hello</h1> → 1a2b3c4d → .git/objects/1a/2b3c4d

Benefits:

Automatic deduplication (same content = same hash = stored once)
Tamper detection (change content = different hash)
Easy verification (recalculate hash to check integrity)

Detecting Changes: Three Comparisons

Git actually performs three different comparisons:

┌─────────────────┐
│ Working         │ ◀─── Compare 1: Modified?
│ Directory       │      (Working vs Index)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Index           │ ◀─── Compare 2: Staged?
│ (Staging Area)  │      (Index vs Repository)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Repository      │ ◀─── Compare 3: Committed?
│ (HEAD commit)   │      (Current vs Previous)
└─────────────────┘

Loading syntax highlighter...

Example:

bash

# Scenario
1. Committed version: "Hello"      (blob: abc123)
2. Staged version: "Hello World"   (blob: def456)
3. Working version: "Hello World!" (blob: ghi789)

# git status output
Changes to be committed:
  modified:   index.html           ← Index ≠ Repository

Changes not staged for commit:
  modified:   index.html           ← Working ≠ Index

Git compares hashes at each level to show you exactly what's different.

Branches: Just Pointers!

Here's something that might blow your mind: branches in Git are just files containing a commit hash.

What is a Branch?

A branch is a lightweight movable pointer to a commit.

Example:

bash

# The main branch is literally just a file:
$ cat .git/refs/heads/main
a3f8b2c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5

# That's it! Just a commit hash.

Creating a branch:

bash

$ git branch feature-login

# Git creates: .git/refs/heads/feature-login
# Content: (same hash as main)
a3f8b2c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5

Branch Visualization

Commit History:
───────────────
C1 ← C2 ← C3 ← C4 ← C5
                     ↑
                     └─ main (points to C5)
                     └─ feature-login (also points to C5)

.git/refs/heads/main:           a3f8b2c (C5's hash)
.git/refs/heads/feature-login:  a3f8b2c (C5's hash)
.git/HEAD:                      ref: refs/heads/main

When you make a commit on feature-login:

C1 ← C2 ← C3 ← C4 ← C5 ← C6
                     ↑    ↑
                     │    └─ feature-login (moved to C6)
                     └────── main (still at C5)

.git/refs/heads/feature-login:  7g8h9i0 (C6's hash) ← Updated!

Why this is amazing:

Creating a branch is instant (just writing a hash to a file)
Branches use almost no disk space
Switching branches just changes what HEAD points to

HEAD: Where You Are

HEAD is a special pointer that tells Git "which commit am I currently on?"

bash

$ cat .git/HEAD
ref: refs/heads/main

# HEAD points to the main branch
# main points to commit a3f8b2c
# So you're on commit a3f8b2c

Detached HEAD (when HEAD points directly to a commit):

bash

$ git checkout a3f8b2c
# HEAD is now at a3f8b2c (detached)

$ cat .git/HEAD
a3f8b2c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5

# HEAD points directly to a commit, not a branch

The Mental Model: Putting It All Together

Let's build a complete mental model of how everything works together.

The Git Universe

┌─────────────────────────────────────────────────────────────┐
│                        YOUR PROJECT                          │
│                                                              │
│  ┌────────────────────────────────────────────────────┐    │
│  │              WORKING DIRECTORY                      │    │
│  │  (Files you see and edit)                          │    │
│  │                                                     │    │
│  │  my-project/                                       │    │
│  │  ├── index.html                                    │    │
│  │  ├── about.html                                    │    │
│  │  └── css/style.css                                 │    │
│  └────────────────────────────────────────────────────┘    │
│                          │                                   │
│                          │ git add                           │
│                          ▼                                   │
│  ┌────────────────────────────────────────────────────┐    │
│  │              INDEX (Staging Area)                   │    │
│  │  .git/index                                        │    │
│  │                                                     │    │
│  │  Staged files:                                     │    │
│  │  ├─ index.html  → blob 1a2b3c                     │    │
│  │  ├─ about.html  → blob 4d5e6f                     │    │
│  │  └─ css/style.css → blob 9j0k1l                   │    │
│  └────────────────────────────────────────────────────┘    │
│                          │                                   │
│                          │ git commit                        │
│                          ▼                                   │
│  ┌────────────────────────────────────────────────────┐    │
│  │              REPOSITORY (.git folder)               │    │
│  │                                                     │    │
│  │  COMMITS:                                          │    │
│  │  C1 ← C2 ← C3                                     │    │
│  │             ↑                                      │    │
│  │             └─ main                                │    │
│  │                                                     │    │
│  │  OBJECTS:                                          │    │
│  │  ├─ Commits (C1, C2, C3)                          │    │
│  │  ├─ Trees (directory structures)                   │    │
│  │  └─ Blobs (file contents)                         │    │
│  │                                                     │    │
│  │  REFS:                                             │    │
│  │  ├─ heads/main → C3                               │    │
│  │  └─ HEAD → refs/heads/main                        │    │
│  └────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Loading syntax highlighter...

The Complete Workflow

USER ACTIONS                 WHAT GIT DOES INTERNALLY
─────────────────           ──────────────────────────────

1. Edit index.html          (Changes only in working directory)
   │
   │
2. git add index.html  ──▶  • Hash content → 1a2b3c
   │                        • Create blob in objects/1a/2b3c
   │                        • Update .git/index
   │
   │
3. git commit          ──▶  • Create tree object (4d5e6a)
   │                        • Create commit object (a3f8b2)
   │                        • Update refs/heads/main
   │                        • Update HEAD
   │
   │
4. git log             ──▶  • Read commit chain from HEAD
   │                        • Display commit metadata
   │
   │
5. git status          ──▶  • Compare working vs index (hashes)
                            • Compare index vs HEAD (hashes)
                            • Report differences

Loading syntax highlighter...

Practical Exploration: See It Yourself

Let's do a hands-on exercise to see Git's internals in action.

Exercise: Create and Inspect Git Objects

bash

# 1. Create a new test repository
mkdir git-internals-test
cd git-internals-test
git init

# 2. Create a simple file
echo "Hello Git Internals" > test.txt

# 3. Add it (creates a blob!)
git add test.txt

# 4. Find the blob hash
git hash-object test.txt
# Output: e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 (example)

# 5. Look inside the objects directory
find .git/objects -type f
# Output: .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391

# 6. Read the blob content
git cat-file -p e69de29
# Output: Hello Git Internals

# 7. See what type of object it is
git cat-file -t e69de29
# Output: blob

# 8. Make a commit
git commit -m "Add test file"
# Output: [main (root-commit) a3f8b2c] Add test file

# 9. Find the commit hash (use your actual hash)
git log --oneline
# Output: a3f8b2c Add test file

# 10. Inspect the commit object
git cat-file -p a3f8b2c
# Output:
# tree 4d5e6a7812...
# author Your Name <email>
# committer Your Name <email>
#
# Add test file

# 11. Inspect the tree
git cat-file -p 4d5e6a7812
# Output:
# 100644 blob e69de29bb2...  test.txt

# 12. See all objects
find .git/objects -type f
# You'll see:
# - Blob (e69de29...)
# - Tree (4d5e6a7...)
# - Commit (a3f8b2c...)

Loading syntax highlighter...

What you learned:

git add created a blob immediately
git commit created a tree and commit object
All objects are stored in .git/objects/
You can inspect any object with git cat-file -p <hash>

Exercise: See How Deduplication Works

bash

# 1. Create two files with identical content
echo "Same content" > file1.txt
echo "Same content" > file2.txt

# 2. Add both
git add file1.txt file2.txt

# 3. Get hashes
git hash-object file1.txt
git hash-object file2.txt
# Both output: 5e40c0877058c504203932e5136051cf3cd3519b (same!)

# 4. Find objects
find .git/objects -type f
# Only ONE blob for both files!

# 5. Commit
git commit -m "Two files, one blob"

# 6. Inspect tree
git cat-file -p HEAD^{tree}
# Output:
# 100644 blob 5e40c08...  file1.txt
# 100644 blob 5e40c08...  file2.txt
#              ↑               ↑
#              └───────┬───────┘
#                 Same hash!

Loading syntax highlighter...

Magic: Git stored the content only once, even though it appears in two different files!

How Git Ensures Integrity

Git uses hashes everywhere, and this provides incredible data integrity.

Tamper Detection

Scenario: Someone tries to modify an old commit

Original commit C2:
├── tree: abc123
├── parent: xyz789
├── author: Alice
├── message: "Fix bug"
└── hash: def456 (calculated from all above)

Tampered commit C2:
├── tree: abc123
├── parent: xyz789
├── author: Bob        ← Changed!
├── message: "Fix bug"
└── hash: ???

When Git recalculates the hash:
Hash ≠ def456 → Corruption detected!

Loading syntax highlighter...

Result: Any modification to a commit changes its hash, which cascades up the chain, making it obvious something was tampered with.

The Merkle Tree Structure

Git's commit history is a Merkle tree (or hash tree):

       Commit C3 (hash includes ↓)
            │
            ├─── tree T3 (hash includes ↓)
            │       ├─── blob B1
            │       └─── blob B2
            │
            └─── parent: C2 (hash includes ↓)
                     │
                     ├─── tree T2
                     └─── parent: C1
                              ├─── tree T1
                              └─── parent: null

Loading syntax highlighter...

Why this matters:

Change any blob → tree hash changes
Change tree → commit hash changes
Change commit → child commit hash changes
You can verify the entire history by checking one hash (HEAD)

This is the same cryptographic structure used in blockchain!

Common Misconceptions Clarified

Misconception #1: "Git stores diffs"

Reality: Git stores complete snapshots, not diffs.

When you commit, Git saves the full content of every file (as blobs), not just what changed. However, Git is smart about storage:

Identical content = same blob = stored once
Git can compress and pack objects later to save space
Diffs are calculated on-the-fly when you run git diff or git log -p

Misconception #2: "Branches are copies of code"

Reality: Branches are just 41-byte files containing a commit hash.

Creating a branch doesn't copy any files. It just creates a new pointer. This is why branching in Git is instant and uses negligible space.

Misconception #3: "The staging area is unnecessary"

Reality: The index gives you precise control over commits.

The staging area lets you:

Commit only some of your changes
Review exactly what will be committed
Build commits incrementally
Stage parts of files (with git add -p)

Misconception #4: "Git is slow because it checks every file"

Reality: Git uses timestamps and hashes for lightning-fast comparisons.

Git first checks file metadata (size, modification time). Only if that changed does it hash the content. And even then, comparing hashes is instant.

Visualizing Git's Architecture

Let me give you one final, complete visualization:

┌─────────────────────────────────────────────────────────────────────┐
│                        THE .GIT FOLDER                               │
│                    (Git's Complete Database)                         │
└─────────────────────────────────────────────────────────────────────┘

.git/
│
├── HEAD ─────────────────────▶ ref: refs/heads/main
│                               "I'm on the main branch"
│
├── index ────────────────────▶ Binary file tracking staged changes
│                               (The staging area)
│
├── config ───────────────────▶ Repository settings
│                               (user name, remotes, etc.)
│
├── objects/ ─────────────────▶ THE HEART: All your data
│   ├── 1a/
│   │   └── 2b3c4d... ────────▶ Blob (file content)
│   ├── 4d/
│   │   └── 5e6a7b... ────────▶ Tree (directory)
│   ├── a3/
│   │   └── f8b2c9... ────────▶ Commit (snapshot)
│   └── pack/ ────────────────▶ Compressed objects (Git optimizes later)
│
├── refs/ ────────────────────▶ All your branches and tags
│   ├── heads/
│   │   ├── main ─────────────▶ a3f8b2c9... (commit hash)
│   │   └── feature-login ───▶ 7g8h9i0j... (commit hash)
│   └── tags/
│       └── v1.0 ─────────────▶ 5k6l7m8n... (commit hash)
│
└── logs/ ────────────────────▶ History of what branches pointed where
    ├── HEAD ─────────────────▶ Reflog (your local history)
    └── refs/heads/main ──────▶ main branch's history

Loading syntax highlighter...

Why Understanding This Matters

You might be wondering: "Why do I need to know all this? Can't I just use Git without understanding internals?"

Fair question! Here's why this knowledge is valuable:

1. Debugging Issues

When something goes wrong, you'll know where to look:

Lost commit? Check git reflog (it's in .git/logs/)
Confused about merge? You understand how commits link together
Worried about disk space? You know Git deduplicates blobs

2. Using Advanced Features

Understanding internals makes advanced commands make sense:

git rebase rewrites commits (creates new hashes)
git cherry-pick copies commits (creates new commit with same tree)
git gc repacks objects and cleans up loose objects

3. Building Mental Models

Instead of memorizing commands, you understand why things work:

Why branches are so fast (just pointers!)
Why Git is so reliable (cryptographic hashing)
How Git saves space (content-addressable storage)

4. Confidence

You're not afraid of Git anymore because you know what it's doing. You can experiment, knowing the .git folder has your back.

Key Takeaways

Let's recap the essential concepts:

The Big Ideas

Everything is an object: Blobs (file content), Trees (directories), Commits (snapshots)
Hashes are everywhere: Git uses SHA-1 hashes as unique identifiers and for integrity
The .git folder is Git: Your entire project history lives in this one folder
Commits are snapshots: Git saves complete project states, not just diffs
Branches are pointers: Lightweight, fast, and use almost no disk space
Content-addressable storage: Same content = same hash = stored once
Three-stage workflow: Working Directory → Index → Repository

The Internal Process

git add:
  1. Hash file content
  2. Create blob in objects/
  3. Update index

git commit:
  1. Create tree objects from index
  2. Create commit object pointing to tree
  3. Update branch reference
  4. Update HEAD

The Mental Model

Commit ────▶ Tree ────▶ Blobs
  │           │
  │           └─────▶ Trees (subdirectories)
  │
  └─ Points to parent commit(s)
     └─ Creates history chain

What's Next?

Now that you understand Git's internals, you're ready for more advanced topics:

Remote repositories: How git push and git pull work with Git's object model
Merge strategies: How Git combines different commit histories
Rebase internals: How Git rewrites commit history
Git packfiles: How Git optimizes storage
Reflog: Your safety net for recovering "lost" commits
Garbage collection: How Git cleans up unused objects

Final Thoughts

Git is beautifully elegant once you understand it. Everything revolves around:

Content-addressable storage (hashes)
Immutable objects (commits, trees, blobs)
Lightweight pointers (branches, tags, HEAD)

Git's genius is hiding this complexity behind simple commands while giving you the power of a cryptographically secure, distributed version control system.

Next time you run git commit, you'll know exactly what's happening behind the scenes. And that knowledge? That's power.

Now go explore your .git folder with confidence! 🚀

Want to dive deeper? Try these exercises:

Use git cat-file -p to explore every object in your repository
Watch how blob hashes change when you modify file content
Create multiple branches and observe .git/refs/heads/
Check your reflog with git reflog to see your command history

Questions or discoveries? I'd love to hear what you found in the comments!

Atharv Dange

Comments

You Might Also Like

Atharv Dange

Comments

You Might Also Like