Dhruv's DevOps & Engineering Notes

I used to think Docker + systemd was enough

Dhruv Bhartia — Wed, 25 Mar 2026 23:12:54 GMT

Around late 2019 / early 2020, I was working with setups where applications were running directly on VMs.

Each team had their own set of VMs, environments were split, CIDR ranges were being managed separately. Deployments were done using scripts/playbooks, which felt like mini package managers in their own way - it was quite a lot.

At that time, I remember having very basic questions:

why so many VMs?
why can't we just run multiple apps on the same machine using different ports?

I didn't really have a clear answer back then. It just felt like "this is how it's done".

Around the same time, I had also started exploring Kubernetes.

So there were two things happening in parallel:

seeing apps run on VMs in this structured but slightly heavy setup
and learning Kubernetes, without fully understanding where it actually fits

I didn't connect these two at that point.

Recently in 2026, while preparing for CKA/CKAD, I found myself thinking about that phase again.

Back when I had just started exploring Kubernetes, I used to hear explanations like:

"we can't use just Docker, we need Kubernetes as it makes the system reliable"

At that time, I interpreted it very literally.

Containers can go down.
Kubernetes makes sure they come back up.

And somewhere in that, a very practical doubt came up:

if I can package an app in a container, and use something like systemd to restart it when it crashes.. isn't that enough?

So it would be like:

container handles packaging
systemd handles restarts

so what exactly is Kubernetes adding here?

I didn't push that thought very far.

I didn't have enough context, and over time it just faded.

Experiment: Here is a github link with steps if anyone is interested: lab - docker-systemd-reliability

But recently, when I came back to that old thought, I tried to reason through it again.

Q. What if I need more than one instance?

-> Okay.. I can probably run multiple containers.

Q. What if they need to run on different machines?

-> Hmm.. maybe we need a proxy at each VM for that app, but the exact approach was unclear.

Q. What if one of those machines goes down?

-> Now it starts to resemble the old setup, where we can't do much without manual intervention.

These questions made me realize:

Earlier, I was thinking about how to run this app properly. My view was limited to Docker and systemd, which were right in front of me. I was just seeing it as an app and a server.

The setup itself isn't failing. It works at an individual level, but it doesn't really answer these new questions - the ones that go beyond a single machine.

And that's where Kubernetes started to make more sense - not as a tool or usage pattern, but in terms of the problem it is actually solving.

It is handling the part I wasn't even looking at earlier. For the questions above, it kind of fills that missing gap.

I have been using Kubernetes for some time now, learning and adapting it in my daily work:

I knew how to:

deploy things
debug issues
check logs
fix problems

And interestingly, those steps haven't really changed even now. What has changed is the depth and design that I see behind the same actions.

And on top of that, the idea of "run something and keep it alive on the machine" still exists - kubelet.

Kubelet is not exactly systemd + Docker and does a lot more, but a part of its responsibility still feels similar: making sure whatever is supposed to run is actually running on the node.

What has changed is how I see what's happening underneath.

Earlier, kubectl felt like a command.

Now it feels more like:

I'm making an API call to a system

auth details are stored in a context (kubeconfig)
there's a control plane
components talking to each other
state being stored and reconciled
decisions being made about where things should run

It all feels more real now, instead of just a tool or command line. The architecture and extensibility make more sense beyond just YAML.

It reminds me of how web development is taught.

At first, a browser is just:

something that loads a webpage

Later you realize:

there's a rendering engine
networking layer
JS engine
parsing, reflow, painting

Same browser.
Different understanding.

Feels like I've unlocked a deeper understanding of Kubernetes in a similar way.

And maybe that's why that old Docker + systemd thought came back.

Not because it was wrong.

But because now I can finally see where it fits and where it stops.

Understanding Network Devices: How the Internet Reaches Your Device

Dhruv Bhartia — Fri, 30 Jan 2026 18:30:00 GMT

We all use the internet today.

The most common way is via Wi-Fi or mobile data, with Wi-Fi being the most prominent.
Ever wondered how your device actually gets that Spotify song, YouTube video, or even shows you this particular blog?

It often feels like we are in a direct connection with Spotify or YouTube from our system - but that’s not the case.

There are multiple layers and multiple devices involved before a request from your device reaches a server and comes back with a response.

It’s also not that we weren’t connected to the world before the internet. We already had means like telephones and cable TV.
The internet is just another medium - one that expanded the horizon of what kind of data can be shared over an existing connection to the world.

Let’s break down the key devices that make this entire journey possible.

Modem - Translating Languages Between Worlds

In the earlier days of the internet, computers were connected using telephone lines:

Telephone Line → Modem → Ethernet Cable → Computer

We already had telephone connections with companies that were well connected across regions and countries.
The only issue was the language of communication.

Telephone lines understood analog signals
Computers spoke in digital signals

Telephones didn’t transmit our voice as-is.
The microphone converted sound into analog signals, sent them over the wire, and the receiver converted them back into sound.

Computers needed something similar - a way to convert their digital data into a form that telephone wires could understand.

This is where the modem came into existence.

A Modem (Modulator–Demodulator):

Converts digital signals → analog signals
Converts analog signals → digital signals

This translation allowed computers to communicate using existing telephone infrastructure.

Router – Making Sure Traffic Reaches the Right Device

As time passed, the number of devices increased and Wi-Fi came into existence.

Now a single household might have:

Phones
Laptops
TVs
Tablets

We can’t give each device its own internet connection like mobile phones do with SIM cards.

So the question became:

How do we connect multiple devices using a single internet connection?

If a single device is requesting data, it’s easy.
But with multiple devices, we need a way to identify who requested what.

Imagine:

You are watching a football match
Someone else in your home is watching cricket
Suddenly you start getting the cricket stream

That’s exactly what a router prevents.

A router:

Keeps track of devices
Ensures responses go back to the same device that requested them

This routing logic isn’t limited to homes.
Routers exist at all levels of the internet, connecting one logical network to another.

We’ll revisit this idea again later.

Router vs Modem – Do Routers Replace Modems?

It often feels like routers have replaced modems.

We commonly say:

“I have a fibre Wi-Fi router at home”

That statement is partially true.

If we recall what a modem actually does - signal translation - we’ll realize that routers never took over that responsibility.

A router manages and routes traffic (like a traffic policeman)
A modem translates signals so they can travel over physical media

What changed is integration.

Today:

Signal translation happens using light signals over fibre
This role is handled by something called an ONT
The modem functionality is hidden inside the same device

The idea remains the same - there is still a translator.

Analogy:

Earlier: a human translator
Now: Google Translate

Both do the same job; the implementation evolved.

So:

Modem and router both exist
They just live inside the same physical device in most homes today

Switch and Hub - Local Network Communication

So far, we’ve talked about connecting to the internet.

But sometimes we only need local communication, within a limited area like:

An office
A data center
A home network

For this, we use switches and hubs.

Routers can do this, but they are overkill.

A simple analogy:

Finding an item directly in a box
Versus opening a box, finding another box, and then finding the item

Switches are Layer 2 (L2) devices
Routers are Layer 3 (L3) devices

More layers = more processing.

Hub

A hub is like a railway station announcement.

Any message sent is broadcast to all ports
Every connected device receives it
Even if the message is meant for just one device

Switch

A switch is a smart hub.

It knows which devices are connected to which ports
Sends data only to the intended recipient

Analogy:

Hub → General announcement at a railway station
Switch → Airport announcements made for a specific gate

This makes switches faster and more efficient for local networks.

Firewall – Where Security Lives

Until now, we assumed:

We send data
Others receive it willingly

But what if:

Someone accesses data you never intended to share?
Someone sends commands to your device from anywhere in the world?

These are real problems.

Devices often respond to commands by default once they receive them.

That’s why we add security layers, such as:

Firewall
IDS / IPS
IAM
WAF
Antivirus
SIEM

A Firewall is one of these mechanisms.

Firewalls can operate at:

Layer 3
Layer 4
Layer 7
Or combinations of these

Analogy:

Some guards check badges
Some check bags
Some even listen to conversations

In general, when we say firewall, we usually mean L3 / L4 firewalls.

They:

Check source and destination IP
Check ports
Check protocol type
Track connections

This is why blindly disabling firewalls or allowing all traffic - just because a tutorial said so - is dangerous.

Your app might not be malicious, but opening everything can allow malicious actors into your system.

Security helps keep your application available and running.
But availability isn’t only about security - which brings us to the next component.

Load Balancer – Handling Scale and Availability

Think about:

A small local shop
A large supermarket

Now imagine two groups of 1000 people:

Group A goes to the supermarket
Group B goes to the local shop

Who gets served faster?

Most likely, Group A.

Why?

Supermarkets have:

Larger area
Well-defined sections
Multiple entry and exit points
Multiple billing counters

Local shops:

Small space
Single entry/exit
One billing counter

Local shops aren’t badly designed - they just aren’t meant to serve large traffic at the same time.

Now imagine:

A supermarket with multiple billing counters
But everyone lines up at a single counter

That defeats the purpose.

To fix this, we need someone directing people to different counters.

That person is the Load Balancer.

A load balancer:

Sits in front of backend servers
Distributes incoming traffic across multiple instances

Scaling backend servers alone isn’t enough.
Users don’t know:

Which instance exists
Which instance is free
Which instance is overloaded

So users always hit the load balancer, and it decides where the request should go.

Load balancers use different algorithms depending on needs and intelligence required.

How Everything Works Together – End-to-End Flow

Let’s trace how your request reached this blog.

Assume you are reading this from your office computer.

Flow

Open the browser
Enter the URL
Request goes via Ethernet connection to the switch
Switch forwards it to the router (where modem translates the signal)
Router sends the request to the ISP
Request passes through multiple routers
Reaches the ISP router of the hosting server
Router routes and translates the signal back to digital
Traffic goes to the switch
Switch sends it to the server
Server hosts a load balancer
Load balancer routes request to the appropriate backend server

The internal LAN and switches may be physical or virtual, and may not map 1:1 - but the logic remains the same.

Sum Up

All these devices:

Modem
Router
Switch
Hub
Firewall
Load Balancer

Are interesting in their own way and deserve deeper dives.

This write-up was meant to touch upon their responsibilities, how they differ, and how they work together to deliver something as simple as this blog to your screen.

How DNS Resolution Works (Using dig to See What’s Actually Happening)

Dhruv Bhartia — Fri, 30 Jan 2026 18:30:00 GMT

In the last article, we discussed various network devices and the basics of how a request travels across the internet.

If you’re interested, you can read it here:

Understanding Network Devices: How the Internet Reaches Your Device

In this article, we’ll focus on DNS - what it is, why it exists, and how name resolution actually happens under the hood.

What Is DNS and Why Name Resolution Exists

DNS stands for Domain Name System.

As the name suggests, it is a system that holds information about domain names.

Domain names are simply the website names we type into applications - like google.com, youtube.com, etc.

In computer networks, all communication happens using IP addresses.
IP addresses are used on the internet to identify devices in the network. They do not have human-friendly names.

Just like our houses or flats have an address, IP addresses serve the same purpose for computers and network devices (Layer 3 and above).

At first glance, DNS feels similar to a phonebook - and that analogy works - but in a more accurate sense, it behaves closer to a distributed database.

Phones understand numbers, not names.
When you select a person’s name in your phone, the phone application automatically translates that name into a phone number.

DNS does the same thing for the internet.

You type a website name, and DNS returns the corresponding IP address to your application (browser, curl, Postman, etc.), which is then used to make the actual network request.

DNS is a massive system.

Think about an IPL final or a big e-commerce sale. A lot of preparation goes into handling that traffic. But every single user visiting those platforms must first resolve the domain name.

Even then, the comparative load on DNS is surprisingly low.
That’s the beauty of how this system is designed.

Most of the complexity is hidden from us by browsers and tools - but we do have ways to see what’s going on underneath.

Introducing `dig`

dig is a tool used to interact directly with DNS and inspect its responses.

It’s a CLI tool.

Here’s what happens when we run:

dig google.com

;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             183     IN      A       142.251.221.142

This tells us that google.com has an A record pointing to the IP 142.251.221.142.

That is a public IP for Google.

Try this yourself:

Run dig google.com
Copy the IP from the output
Paste it directly into your browser

Observe what happens.

DNS Is Not a Simple Database Lookup

At a surface level, DNS might look like a simple database query:

“Give me the IP for this domain.”

But that’s not the reality.

Imagine storing hundreds of billions of records and querying them globally for billions of devices, all in near-real time.

That approach wouldn’t scale.

The Distributed Structure of DNS

DNS works because it is distributed.

From top to bottom, the hierarchy looks like this:

Root Servers → TLD Servers → Authoritative Name Servers

Each layer has a very specific responsibility.

Root Name Servers

Root servers are the top-most servers in the DNS hierarchy.

They do not store IP addresses for websites.

Instead, they store information about TLD (Top Level Domain) servers.

Based on the domain you’re trying to resolve, the root server directs you to the appropriate TLD server.

There are 13 root server identities globally.

We can see them using:

dig . NS

This returns entries like:

a.root-servers.net.
b.root-servers.net.
...
m.root-servers.net.

These are the starting points for DNS resolution.

TLD Name Servers (`.com`, `.in`, `.dev`, etc.)

TLD servers store information about authoritative name servers.

Examples of TLDs:

.com
.in
.dev
.ai

To inspect .com TLD servers, we can run:

dig com NS

This returns servers such as:

a.gtld-servers.net.
b.gtld-servers.net.
...
m.gtld-servers.net.

At this layer, DNS still does not return IP addresses for websites.

Instead, it tells us where to find the authoritative servers for a given domain.

Authoritative Name Servers

Authoritative name servers are the servers that actually store the DNS records for a domain.

This includes:

A records
AAAA records
(other record types not discussed here)

To know more about record types, can check article: How Does a Browser Know Where a Website Lives

For google.com, we can inspect them using:

dig google.com NS

This returns:

ns1.google.com.
ns2.google.com.
ns3.google.com.
ns4.google.com.

These are the servers that finally know the IP addresses for google.com.

Querying one of these servers gives us the IP we need.

The Full DNS Resolution Flow (`google.com`)

If we assume no caching, DNS resolution for google.com follows this path:

Root Server → .com TLD Server → google.com Authoritative Server → IP

This can be visualized using:

dig google.com +trace

The output clearly shows:

Root servers responding first
.com TLD servers responding next
Google’s authoritative servers responding last with the A record

This trace represents the entire DNS resolution chain.

The Role of the Recursive Resolver

The client (browser or application) does not perform all these steps itself.

There is another component in between called the recursive resolver.

Flow looks like this:

Browser / App → Recursive Resolver → DNS Hierarchy → Resolver → App

The application sends a DNS query to the recursive resolver.

The resolver:

Talks to root servers
Talks to TLD servers
Talks to authoritative servers
Collects the final answer
Returns the IP to the application

Caching in DNS

Caching is one of the most important reasons DNS performs so efficiently.

Caching happens at multiple levels:

Browser
Operating System
Recursive Resolver

Resolvers are heavily relied upon, so caching at this layer has a huge impact on performance.

Operating systems also have their own DNS configurations, which are checked before external resolution.
This is rarely used in day-to-day browsing but is extremely useful for personal testing and overrides.

Note: The command outputs mentioned are trimmed to highlight main part of the response. The actual response for the commands discussed will have much more details.

How Does a Browser Know Where a Website Lives?

Dhruv Bhartia — Fri, 30 Jan 2026 18:30:00 GMT

When you type a website name like example.com into your browser, the browser somehow figures out which server on the internet actually hosts that website.

But how?

This is where DNS comes into play.

What is DNS?

DNS (Domain Name System) is like a phonebook for the internet.

Humans remember names like google.com
Computers communicate using IP addresses like 142.250.183.14

DNS helps resolve domain names to their corresponding IP addresses, so browsers know where to send requests.

DNS is not a single system replying with an IP address.
There are recursive calls made across multiple servers to finally find the correct IP.

If you’re interested in seeing this entire flow in action, you can check out this article:

How DNS Resolution Works

Why DNS Records Are Needed

Now here’s an important question:

How does the recursive resolver know whether it has found the actual website IP or just the next server it should ask?

This is exactly why DNS record types exist.

Each DNS record type tells the resolver what the response means and what to do next.

NS Record - Who Is Responsible for This Domain?

An NS (Name Server) record tells the resolver where to look next.

NS records point the resolver toward the authoritative name server
They do not return the website IP
They tell the resolver:
→ “You need to ask this server for more information about the domain”

A and AAAA Records - The Actual Address

Once the resolver finds these records, the search stops.

A Record → IPv4 address of the website
AAAA Record → IPv6 address of the website

These records contain the actual IP address of the domain.

When the resolver sees an A or AAAA record:

It knows it has reached the destination
It returns the IP address back to the browser

CNAME Record - One Name Pointing to Another

A CNAME (Canonical Name) record is used to create alias names.

A common example:

You host your website on Vercel
Vercel gives you a subdomain
You don’t control the public IP

So instead of pointing to an IP:

Your domain uses a CNAME record
It points to Vercel’s domain

What happens internally?

The resolver sees the CNAME record
It starts a new lookup for the target domain
That lookup eventually resolves to an A or AAAA record

MX Record - How Emails Find the Right Server

Email delivery works differently from web traffic.

That’s why MX (Mail Exchange) records exist.

MX records specify which server should receive emails
They help distinguish between:
- Web servers
- Mail servers

If MX records don’t exist:

A/AAAA records act as fallback
But then the web server must also handle mail logic => Web Server + SMTP server

To keep things simple and clean:

MX records clearly define mail routing
Web servers stay focused on serving websites

TXT Record - Extra Information & Verification

TXT records are not part of normal website routing.

They store extra information, mostly for verification purposes.

Common use case:

SSL certificate issuance
Domain ownership verification

Flow:

Certificate Authority (CA) asks for proof of domain ownership
You add specific content to a TXT record
CA checks the TXT record
Certificate is issued if verification passes

At a high level:

TXT records are to DNS what meta tags are to HTML.

How All DNS Records Work Together (End-to-End Example)

Let’s put everything together.

Assume:

You request a website
There is no cache

Step-by-step flow:

Resolver asks the root name server
Root replies with an NS record for the TLD
Resolver queries the TLD server
TLD replies with an NS record for the authoritative name server
Resolver queries the authoritative name server
Authoritative server replies with a CNAME record mapped to Vercel
The resolver performs a fresh lookup for the target domain, using cache where possible.
Resolver queries Vercel’s authoritative name server
Gets an A record
Resolver returns the IP to the browser

At this point, the browser finally knows where the website lives.

Getting Started with cURL

Dhruv Bhartia — Fri, 30 Jan 2026 18:30:00 GMT

As a developer, whatever we code eventually gets deployed on a server.
A server is nothing but a program that runs somewhere and serves our application to users.

Users usually interact with this server through a browser. The browser sends requests to the server and shows the response in a user-friendly way.

But as developers, we often want to test backend changes without a UI. Testing APIs or backend logic only through a browser is either impossible or very inconvenient.

This is where cURL comes in.

What is cURL?

cURL is a command-line tool that allows us to make requests to a server.

In simple terms:

cURL lets you send messages to a server directly from the terminal.

It is very useful for developers because:

It helps in testing backend APIs
It helps in troubleshooting server issues
It allows testing functionality independent of the UI

cURL supports multiple protocols, but in this article we will focus only on HTTP / HTTPS.

Even though it might look complex or scary because it’s a CLI tool, it is actually quite simple to use.

Making Your First Request Using cURL

Let’s start with the most basic example:
Fetching a webpage from a locally running nginx server.

root@db997926b412:/# curl localhost



Welcome to nginx!



Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.

For online documentation and support please refer to
nginx.org.

Commercial support is available at
nginx.com.

Thank you for using nginx.

Here, cURL fetched the raw HTML from the server.

If we open the same URL in a browser, we see a nicely rendered page.
The content is the same - the difference is that the browser knows how to render HTML, while cURL just prints it as text.

Understanding the Response (Headers and Body)

To see more details about what’s happening behind the scenes, we can ask cURL to show a verbose response.

curl -v localhost

*   Trying 127.0.0.1:80...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.27.5
< Date: Sat, 31 Jan 2026 14:55:07 GMT
< Content-Type: text/html
< Content-Length: 615
< Last-Modified: Wed, 16 Apr 2025 12:01:11 GMT
< Connection: keep-alive
< ETag: "67ff9c07-267"
< Accept-Ranges: bytes
<

...

From this output we can clearly see:

The request being sent (GET /)
The response status (200 OK)
The headers sent by the server
The response body (HTML content)

What Happens When a Page Doesn’t Exist?

If we try to access a path that doesn’t exist, the server responds with an error.

curl -v localhost/path

< HTTP/1.1 404 Not Found
< Server: nginx/1.27.5
...

This shows a 404 error, meaning the requested resource was not found.

All the requests we made so far were GET requests, which we can also confirm from the request headers.

Using cURL to Talk to APIs

cURL is not limited to fetching web pages.
It is commonly used to test APIs.

Below is an example of using cURL to test a POST API for user registration.

The response shows a 200 status code, which means the request was successful.

Next, we try logging in using the newly created user.

The login also works successfully.

Now let’s try passing invalid credentials.

This time, the API responds with an error, which confirms that we are properly testing different scenarios.

Using cURL, we are able to test APIs directly from the terminal, without needing any UI.

Basic cURL Flags Used

In the API examples above, we used a few common cURL flags. Let’s understand them:

--url
Tells cURL the URL to which the request should be sent.
--request
Specifies the HTTP method to use (for example, POST).
--header
Used to pass headers along with the request.
Multiple headers can be passed by using this flag multiple times.
--data
Used to send the request body or payload expected by the API.

There are many more flags and ways to use cURL, but this article focuses only on the basics needed to get started.

Why Version Control Exists: The Pendrive Story Every Developer Has Lived

Dhruv Bhartia — Fri, 16 Jan 2026 18:30:00 GMT

General Mindset: Asking Why Before Using Any Tool

Before jumping into any new tool or technology, I like to pause and ask a few basic questions:

Why do I need this tool?
Why does this exist?
What problem does it actually solve?
Can I work without it?

This article is written with the same mindset.

It is not an explanation of Git or how Git works internally.
Instead, it is an attempt to understand why version control systems exist in the first place, and what problems they were designed to solve.

Life Before Version Control Systems

Before version control systems became mainstream, developers still built software, collaborated in teams, and shipped projects. But the way collaboration happened was very different.

Code was commonly shared using:

Pendrives
Emails with ZIP attachments
Shared folders
Files and directories named like:
- final
- final_v2
- latest_final
- final_latest_really

At a small scale, this seems manageable.
But as soon as more than one person starts working in parallel, things begin to break in subtle ways.

To understand this better, let’s look at a simple story.

The Pendrive Story: Three Friends, One Project

The Setup

Three friends - Alice, Bob, and Charlie - are in college and decide to build a personal profile website together.
They do not know about any version control system.

The application has three components:

Navbar
Profile section
Blogs section

They divide the work as follows:

Alice works on the profile
Bob works on the blogs page
Charlie takes the navbar and index page

Charlie finishes early. He copies the code to a pendrive and passes it to Alice.
Alice copies it to her local system and then passes the pendrive to Bob.

At this point, Alice and Bob are working independently on their assigned features.

Where Things Start Breaking

While Alice and Bob are working, Charlie becomes free and decides to improve the navbar styling.
He asks Bob for the pendrive.

At the same time:

Bob finishes his blogs page and adds it to the pendrive
Charlie copies his updated navbar to the pendrive

Curious to see how things look together, Charlie runs the code with the navbar and blogs page.
He notices some bugs and misalignments in the blogs page and does a quick fix.

Meanwhile, Bob also finds a few issues in the blogs page and fixes them on his system.

Bob asks for the pendrive back.
Charlie passes it to him, along with a note explaining the changes he made in the blogs page.

Now Bob is stuck.

He can’t simply replace his entire blogs folder anymore.
Some issues are already fixed by Charlie, while others conflict with Bob’s own logic.

To deal with this, Bob creates a new folder called _latest, compares both versions, and manually reuses parts of the code.

Silent Overwrites and Lost Changes

During all this, Alice finishes her work and asks for the pendrive.

Bob passes it to her.
Alice copies her profile code to the pendrive.

While reviewing the code, Alice notices a small typo in the index page and fixes it.

Later, Bob completes his remaining fixes and asks for the pendrive again so they can finalize the project.

Alice shares the pendrive.
Bob replaces all the files on the pendrive with the code from his _latest folder, assuming Alice’s work would not be impacted.

When all three check the final code, Alice notices something surprising:
the typo she fixed in the index page is gone.

After discussion, Bob realizes he unintentionally overwrote Alice’s fix.
Alice had forgotten to leave a note about the index page change, and Bob had no way of knowing it existed.

The fix was silently lost.

What This Story Shows Us

Even for a very small project, collaboration became difficult.

Problems that appeared:

Code getting overwritten without anyone realizing
Changes getting lost silently
Manual notes becoming a dependency
Folder comparisons like _latest becoming normal
No reliable history of who changed what and when

Everyone tried to be careful.
Still, things broke.

Why This Completely Fails at Scale

As teams grow, this approach becomes impossible to sustain.

One option is to enforce strict discipline:

Always leave notes
Always communicate every change
Be extra careful while copying files

Another option is to treat the pendrive as the only editable source:

If you have it, you can make changes
If you don’t, you wait

This can be replaced by a server where only one person edits at a time.

While this sounds safe, it comes with heavy constraints:

No real parallel work
Long waiting times
False sense of safety

Even systems that allow multiple people to edit at the same time show how difficult it is to get things right in a single attempt.

This is where the illusion of parallelism vs concurrency becomes clear.

There is a gap - and manual coordination is not enough to fill it.

The Idea That Changes Everything

Charlie recognizes this gap and comes up with an idea:
an automated tracker called Charlie Code Tracker (CCT).

CCT does a few important things:

Tracks what changed and who made the change
Can be installed on each developer’s local machine
Allows everyone to work freely on their own system
Helps highlight inconsistencies or overlaps during sync
Allows pulling others’ changes and pushing your own

Because changes are tracked automatically, developers no longer need to rely on memory or notes.

This idea removes the restriction of working on a single device and enables safe collaboration.

This Is Why Version Control Exists

This CCT idea represents what a version control system is meant to solve.

Version control systems exist to:

Enable parallel work without overwriting
Maintain a reliable history of changes
Make collaboration safe, traceable, and scalable
Reduce human error instead of relying on discipline alone

Teams may choose to treat a shared server as a coordination point, but the core value lies in tracking history and changes, not just storing files.

To understand how Git handles tracking and history internally, the next article will focus on:

Give it a Read: How Git Works Internally: Building a Mental Model

Understanding Git: Why It Exists and How We Use It

Dhruv Bhartia — Fri, 16 Jan 2026 18:30:00 GMT

What is Git

Git is a type of Version Control System (VCS).
It is open source and runs on your local machine.

Before diving into Git, it helps to understand why version control exists in the first place. We discussed the idea of VCS in an earlier article.
Give it a read: Why Version Control Exists: The Pendrive Story Every Developer Has Lived

To understand Git better, let’s first look at how version control evolved over time.

Early Ways of Managing Code (Before Git)

Method 1: Local Versioning

A very common early approach was to maintain multiple copies of the same project locally:

project_1  
project_2  
project_3  
project_final  
project_final_2

or sometimes using timestamps.

This approach:

Quickly becomes messy
Leads to confusion over which version is correct
Allows only one person to work comfortably
Lives entirely on a personal device

There is some sense of versioning here, but no real tracking or collaboration.

Method 2: Centralized Version Control (CVCS - e.g., SVN)

A better approach was to introduce a central server.

The server stores project files along with their history
Developers fetch a copy, make changes, and push them back
The server manages the code and its history

This solved collaboration issues but introduced new problems:

You always need connectivity to the server
If the central server fails, the project history is at risk

Method 3: Decentralized Version Control (DVCS)

The centralized approach still had limitations.

A decentralized approach solves this by:

Keeping the entire history on every developer’s machine
Allowing work to happen offline
Enabling sync with any peer, remote, or server at any time
Removing a single point of data loss

Git follows this decentralized approach.

Every developer has:

The full code
The full history
Full tracking information

Git runs locally, while platforms like GitHub, GitLab, and Bitbucket are hosting services built on top of Git.

One interesting design choice in Git is that it does not store file diffs.
Instead, Git stores snapshots of the repository at a point in time.
Each commit becomes a reference to the state of the entire repository at that moment.

If you are interested more in this, I have linked an article explaining git internals at the end of this article.

Why Git is Used

Before version control systems existed, managing projects was error-prone and difficult.

When VCS was introduced, it helped - but early implementations came with their own limitations.

The evolution roughly looks like this:

Projects shared using pendrives, emails, or zip files
The idea of version control introduced
Multiple implementations appeared:
- Local VCS
  - Tracking exists
  - No collaboration
  - Everything stays on one machine
- Centralized VCS
  - Easy collaboration
  - Requires constant server access
  - Risk if the server goes down
- Decentralized VCS (Git)
  - Full local history
  - Offline work
  - Independent collaboration
  - No single point of failure

Git became popular because it solved the limitations of both local and centralized systems while keeping collaboration fast and reliable.

Git Basics and Core Terminologies

Below are some common Git terms you’ll frequently come across.

Repository (Repo)

A repository is: Project files + .git directory

The .git directory is where Git stores all tracking and history information.

Working Directory

The working directory is your actual workspace:

Where you see files
Where you edit files
Where changes are made before being tracked

Commit

A commit represents:

A snapshot of the code at a given time
A unique hash
A commit message

A chain of commits forms a linked timeline of the project’s history.

Staging Area

The staging area is a special cache-like area that sits between: Working Directory → Repository

It allows you to:

Select specific changes
Decide what should be included in the next commit

Branch

A branch is a pointer to a commit.

Using branches allows:

Parallel work
Independent experimentation
Multiple lines of development alongside the original

HEAD

HEAD is a reference to:

The current branch
The current commit

In simple terms, it tells Git where you are right now.

Remote

A remote is another copy of the repository, usually hosted on a server such as GitHub or GitLab.

Common Git Commands

Below are some commonly used Git commands grouped by purpose.

Starting & Inspecting

Initialize a repository

git init

Creates a new .git/ directory.

Check repository status

git status

Shows:

Changed files
Staged files
Untracked files

View commit history

git log

Making Changes

Stage a specific file

git add

Stage all changes

git add .

Create a commit

git commit -m "message"

Creates a snapshot from the staged changes.

Branching

List branches

git branch

Create a new branch

git branch

Switch branches

git checkout

Fixing Mistakes

Discard local changes for a file

git restore

Undo last commit but keep changes staged

git reset --soft HEAD~1

Undo last commit and discard changes

git reset --hard HEAD~1

Below is the article in which we go through practical example to understand git internals, give it a read:

How Git Works Internally: Building a Mental Model

How Git Works Internally: Building a Mental Model

Dhruv Bhartia — Fri, 16 Jan 2026 18:30:00 GMT

Most Git tutorials focus on commands.
This article focuses on what actually happens inside Git when we run those commands.

Before going forward with this article, you can go throgh below for better understanding

Why Version Control Exists: The Pendrive Story Every Developer Has Lived

Understanding Git: Why It Exists and How We Use It

We’ll explore:

What the .git folder is and why it exists
How Git stores data internally using objects
Why Git commits are called snapshots
How Git tracks changes efficiently

All examples below are based on hands-on experimentation, not theory.

The `.git` Directory: The Heart of Git

When we initialize a Git repository:

git init

Git creates a .git/ directory.

/home/app # ls -la
drwxr-xr-x    6 root root 4096 Jan 17 12:59 .git

Why does `.git` exist?

.git/ stores all information about version tracking
This includes:
- commit history
- branches
- file snapshots
- metadata
If the .git/ directory is lost:
- all Git history and tracking is lost
- files remain, but Git has no memory of them

From Working Directory to Commit

Any change you make flows through these stages:

Working Directory
- New file or modified file
Staging Area
- Changes selected to be recorded
Commit
- A snapshot of the project is stored permanently

At a high level:

Working Directory → Staging Area → Commit

To really understand Git, we need to zoom into what a commit actually contains.

A Simple Repository Walkthrough

Initialize a repo

mkdir app
cd app
git init

Create a file:

touch app.txt
git status

Git shows the file as untracked.

First Commit

git add app.txt
git commit -m "Create App file"

This creates the first commit (root commit).

Checking history:

git log

We now see a commit chain starting point.

Git Commit History Is a Chain

Each commit:

Has a unique hash
Stores a reference to its parent commit

This forms a linked structure:

commit → parent → parent → ...

To inspect a commit internally:

git cat-file -p

Example output:

tree bda94d5297b34fc5391112596c3f6b2926891352
parent 37087ac939b14f57c7b223d0903ffb5cb4d1896a
author ...
committer ...

Add Line 2 in app and new Readme file

What this tells us

A commit stores:

a reference to a tree
a reference to its parent
author and message

So the commit itself does not store file contents directly.

Tree Objects: Representing Folder Structure

Let’s inspect the tree object:

git cat-file -p

Output:

100644 blob 3485b695ca9834fcdc2bf439f1c12109b8b54634    README.md
100644 blob 40f9bae6a2073fc65d8e2b618b73534a84317ad7    app.txt

A tree:

represents a directory
maps filenames → blob hashes
can reference other trees (for subdirectories)

Blob Objects: Actual File Content

Inspecting a blob:

git cat-file -p

Output:

This is line 1
This is line 2

A blob:

stores only file content
has no filename information
same content → same blob hash

Why Commits Are Snapshots

After adding a new line and committing again:

echo "This is line 3" >> app.txt
git commit -am "Add line 3 in app"

Inspecting the new commit shows:

a new tree
a new blob for app.txt
same blob hash for README.md

This proves:

Each commit represents a full snapshot
Unchanged files reuse existing blobs
Git optimizes storage automatically

Git does not save “changes” - it saves states.

Exploring the `.git` Directory

Listing .git/:

ls .git/

output:
HEAD
objects
refs
index
logs
...

For internal understanding, we focus on:

HEAD
refs
objects

HEAD and Branches

cat .git/HEAD

output:
ref: refs/heads/master

HEAD points to:

a branch
which points to a commit

Inspect branch ref:

cat .git/refs/heads/master

output:
28c6f9787e22397050b706616d20e1c8cccbdc89

Creating a new branch:

git checkout -b feature

Now:

refs/heads/
├── master
└── feature

Both branches initially point to the same commit.

This shows:

A branch is just a file containing a commit hash.

Objects Directory: Where Git Stores Everything

ls .git/objects/

output:
20 28 34 37 40 9f bd e9 f7 info pack

Each folder:

is named using the first two characters of an object hash
contains files named with the remaining characters

Example:

ls .git/objects/28/

This object corresponds to the commit we inspected earlier using git cat-file.

So:

blobs
trees
commits
all live together in objects/

How Git Tracks Changes (Mental Model)

Putting it all together:

git add
- prepares blobs
- updates the staging area
git commit
- creates a tree from staged blobs
- creates a commit pointing to that tree
- links to the parent commit
Branch refs move forward
Old objects remain immutable

Hashes and Integrity

Git uses hashes to:

uniquely identify content
detect corruption
avoid duplicate storage

Same content → same hash
Different content → different hash

Final Takeaway

This exploration shows that Git is:

not magic
not command-driven
but a content-addressed snapshot database

Understanding this internal model makes:

branching intuitive
history manipulation safer
Git errors less scary

Dhruv's DevOps & Engineering Notes

I used to think Docker + systemd was enough

Understanding Network Devices: How the Internet Reaches Your Device

Modem - Translating Languages Between Worlds

Router – Making Sure Traffic Reaches the Right Device

Router vs Modem – Do Routers Replace Modems?

Switch and Hub - Local Network Communication

Hub

Switch

Firewall – Where Security Lives

Load Balancer – Handling Scale and Availability

How Everything Works Together – End-to-End Flow

Flow

Sum Up

How DNS Resolution Works (Using dig to See What’s Actually Happening)

What Is DNS and Why Name Resolution Exists

Introducing dig

DNS Is Not a Simple Database Lookup

The Distributed Structure of DNS

Root Name Servers

TLD Name Servers (.com, .in, .dev, etc.)

Authoritative Name Servers

The Full DNS Resolution Flow (google.com)

The Role of the Recursive Resolver

Caching in DNS

How Does a Browser Know Where a Website Lives?

What is DNS?

Why DNS Records Are Needed

NS Record - Who Is Responsible for This Domain?

A and AAAA Records - The Actual Address

CNAME Record - One Name Pointing to Another

MX Record - How Emails Find the Right Server

TXT Record - Extra Information & Verification

How All DNS Records Work Together (End-to-End Example)

Getting Started with cURL

What is cURL?

Making Your First Request Using cURL

Welcome to nginx!

Understanding the Response (Headers and Body)

What Happens When a Page Doesn’t Exist?

Using cURL to Talk to APIs

Basic cURL Flags Used

Why Version Control Exists: The Pendrive Story Every Developer Has Lived

General Mindset: Asking Why Before Using Any Tool

Life Before Version Control Systems

The Pendrive Story: Three Friends, One Project

The Setup

Where Things Start Breaking

Silent Overwrites and Lost Changes

What This Story Shows Us

Why This Completely Fails at Scale

The Idea That Changes Everything

This Is Why Version Control Exists

Understanding Git: Why It Exists and How We Use It

What is Git

Early Ways of Managing Code (Before Git)

Method 1: Local Versioning

Method 2: Centralized Version Control (CVCS - e.g., SVN)

Method 3: Decentralized Version Control (DVCS)

Why Git is Used

Git Basics and Core Terminologies

Repository (Repo)

Working Directory

Commit

Staging Area

Branch

HEAD

Remote

Common Git Commands

Starting & Inspecting

Making Changes

Branching

Fixing Mistakes

How Git Works Internally: Building a Mental Model

The .git Directory: The Heart of Git

Why does .git exist?

From Working Directory to Commit

A Simple Repository Walkthrough

Initialize a repo

First Commit

Introducing `dig`

TLD Name Servers (`.com`, `.in`, `.dev`, etc.)

The Full DNS Resolution Flow (`google.com`)

The `.git` Directory: The Heart of Git

Why does `.git` exist?

Exploring the `.git` Directory