# How Git Works Internally: Building a Mental Model

Most Git tutorials focus on commands.  
This article focuses on **what actually happens inside Git** when we run those commands.

> Before going forward with this article, you can go throgh below for better understanding
> 
> [**Why Version Control Exists: The Pendrive Story Every Developer Has Lived**](https://dhruvbhartia07.hashnode.dev/why-version-control-exists-the-pendrive-story-every-developer-has-lived?showSharer=true)
> 
> [**Understanding Git: Why It Exists and How We Use It**](https://dhruvbhartia07.hashnode.dev/understanding-git-why-it-exists-and-how-we-use-it?showSharer=true)

We’ll explore:

* What the `.git` folder is and why it exists
    
* How Git stores data internally using objects
    
* Why Git commits are called *snapshots*
    
* How Git tracks changes efficiently
    

All examples below are based on **hands-on experimentation**, not theory.

---

## The `.git` Directory: The Heart of Git

When we initialize a Git repository:

```bash
git init
```

Git creates a `.git/` directory.

```bash
/home/app # ls -la
drwxr-xr-x    6 root root 4096 Jan 17 12:59 .git
```

### Why does `.git` exist?

* `.git/` stores **all information about version tracking**
    
* This includes:
    
    * commit history
        
    * branches
        
    * file snapshots
        
    * metadata
        
* If the `.git/` directory is lost:
    
    * all Git history and tracking is lost
        
    * files remain, but Git has no memory of them
        

---

## From Working Directory to Commit

Any change you make flows through these stages:

1. **Working Directory**
    
    * New file or modified file
        
2. **Staging Area**
    
    * Changes selected to be recorded
        
3. **Commit**
    
    * A snapshot of the project is stored permanently
        

At a high level:

```plaintext
Working Directory → Staging Area → Commit
```

To really understand Git, we need to **zoom into what a commit actually contains**.

---

## A Simple Repository Walkthrough

### Initialize a repo

```bash
mkdir app
cd app
git init
```

Create a file:

```bash
touch app.txt
git status
```

Git shows the file as **untracked**.

---

### First Commit

```bash
git add app.txt
git commit -m "Create App file"
```

This creates the **first commit** (root commit).

Checking history:

```bash
git log
```

We now see a **commit chain starting point**.

---

## Git Commit History Is a Chain

Each commit:

* Has a unique hash
    
* Stores a reference to its **parent commit**
    

This forms a linked structure:

```plaintext
commit → parent → parent → ...
```

To inspect a commit internally:

```bash
git cat-file -p <commit-hash>
```

Example output:

```bash
tree bda94d5297b34fc5391112596c3f6b2926891352
parent 37087ac939b14f57c7b223d0903ffb5cb4d1896a
author ...
committer ...

Add Line 2 in app and new Readme file
```

### What this tells us

A commit stores:

* a reference to a **tree**
    
* a reference to its **parent**
    
* author and message
    

So the commit itself does **not store file contents directly**.

---

## Tree Objects: Representing Folder Structure

Let’s inspect the tree object:

```bash
git cat-file -p <tree-hash>
```

Output:

```bash
100644 blob 3485b695ca9834fcdc2bf439f1c12109b8b54634    README.md
100644 blob 40f9bae6a2073fc65d8e2b618b73534a84317ad7    app.txt
```

A **tree**:

* represents a directory
    
* maps filenames → blob hashes
    
* can reference other trees (for subdirectories)
    

---

## Blob Objects: Actual File Content

Inspecting a blob:

```bash
git cat-file -p <blob-hash>
```

Output:

```bash
This is line 1
This is line 2
```

A **blob**:

* stores **only file content**
    
* has no filename information
    
* same content → same blob hash
    

---

## Why Commits Are Snapshots

After adding a new line and committing again:

```bash
echo "This is line 3" >> app.txt
git commit -am "Add line 3 in app"
```

Inspecting the new commit shows:

* a **new tree**
    
* a **new blob for** `app.txt`
    
* **same blob hash for** [`README.md`](http://README.md)
    

This proves:

* Each commit represents a **full snapshot**
    
* Unchanged files reuse existing blobs
    
* Git optimizes storage automatically
    

> Git does not save “changes” - it saves **states**.

---

## Exploring the `.git` Directory

Listing `.git/`:

```bash
ls .git/

output:
HEAD
objects
refs
index
logs
...
```

For internal understanding, we focus on:

* `HEAD`
    
* `refs`
    
* `objects`
    

---

## HEAD and Branches

```bash
cat .git/HEAD

output:
ref: refs/heads/master
```

`HEAD` points to:

* a branch
    
* which points to a commit
    

Inspect branch ref:

```bash
cat .git/refs/heads/master

output:
28c6f9787e22397050b706616d20e1c8cccbdc89
```

Creating a new branch:

```bash
git checkout -b feature
```

Now:

```bash
refs/heads/
├── master
└── feature
```

Both branches initially point to the **same commit**.

This shows:

> **A branch is just a file containing a commit hash.**

---

## Objects Directory: Where Git Stores Everything

```bash
ls .git/objects/

output:
20 28 34 37 40 9f bd e9 f7 info pack
```

Each folder:

* is named using the **first two characters** of an object hash
    
* contains files named with the remaining characters
    

Example:

```bash
ls .git/objects/28/
```

This object corresponds to the commit we inspected earlier using `git cat-file`.

So:

* blobs
    
* trees
    
* commits  
    all live together in `objects/`
    

---

## How Git Tracks Changes (Mental Model)

Putting it all together:

* `git add`
    
    * prepares blobs
        
    * updates the staging area
        
* `git commit`
    
    * creates a tree from staged blobs
        
    * creates a commit pointing to that tree
        
    * links to the parent commit
        
* Branch refs move forward
    
* Old objects remain immutable
    

---

## Hashes and Integrity

Git uses hashes to:

* uniquely identify content
    
* detect corruption
    
* avoid duplicate storage
    

Same content → same hash  
Different content → different hash

---

## Final Takeaway

This exploration shows that Git is:

* not magic
    
* not command-driven
    
* but a **content-addressed snapshot database**
    

Understanding this internal model makes:

* branching intuitive
    
* history manipulation safer
    
* Git errors less scary
