Git: Merge, Cherry-Pick & Rebase - header image

Git: Merge, Cherry-Pick & Rebase

An unconventional guide

Last updated on January 26, 2022 -

(Buy now if you're already convinced!)

You can use this guide to get a deep understanding of how Git's merges, rebases & cherry-picks work under the hood, so that you'll never fear them again.

(Editor’s note: At ~5500 words, you probably don’t want to try reading this on a mobile device. Bookmark it and come back later. And even on a desktop, eat read this elephant one bite at a time.)

Introduction

Sure, everyone and their grandmother use Git and seems to be comfortable with it.

But did you ever botch a merge and then your solution was to delete and re-clone your repository? Without quite knowing what went wrong and why?

Or did a rebase suddenly make tens of merge conflicts pop up, one after another and you didn’t know what the hell was going on?

In short, do you have nagging doubts, whenever it comes to merging, rebasing and cherry-picking?

Fear not, you’ve come to the right place: The remainder of this guide will help you get rid of those fears.

(Teaser: By the end of this article, you’ll understand that a git cherry-pick is essentially just a git merge. And a git rebase is essentially just a git cherry-pick? Sounds crazy? Read on!)

Git Storage Internals

Before you jump right into the nitty-gritty details of merging, let’s have a look at how Git stores your files and commits.

It might seem a bit weird to start off with internal details, but take a leap of faith: Those internals are the building block for everything else in this guide, so you’ll need to know them first.

Scenario: Committing Two Files

Open up your terminal and execute the following commands.

# create a git repo in a directory of your liking

mkdir gitinternals
cd gitinternals
git init -b main

## add two .txt files and commit them

echo "-TODO-" > LICENSE.txt
echo "a marcobehler.com guide" > README.txt

git add LICENSE.txt
git add README.txt
git commit -m "Project Setup"

## update README.txt's contents

echo "a git guide" > README.txt

git add README.txt
git commit -m "Updated README"

You created two .txt files in a first commit, then updated the contents of one file (README.txt) in a second commit.

Here’s a question for you: How do you think Git will store those two commits, or rather the two versions of README.txt?

  • Will it store full files, i.e. a marcobehler.com guide AND a git guide, somewhere?

  • Will it store deltas, something like a (-marcobehler.com)(+git) guide (pseudo-code)?

Bonus question: How the hell would the answer to this help with merging or rebasing?

Let’s find out!

Inspecting Git repos: 'git cat-file'

Let’s execute a git log in your repository, and you’ll get output similar to this:

# in your repository's directory
git log

# Project Setup

commit 142e5cf36d9f2047f24341883bd564b1d5170370 (HEAD -> main)
Author: Marco Behler <marco@marcobehler.com>
Date:   Tue Dec 28 09:54:44 2021 +0100

    Updated README

commit 715247c8426d3c16881539118e1eafeb38439b1c
Author: Marco Behler <marco@marcobehler.com>
Date:   Tue Dec 28 09:54:25 2021 +0100

    Project Setup

So far, nothing surprising - you’ll see your two commits. Something that you’ve seen, but probably ignored plenty of times are commit ids. Here’s the second commit’s id.

commit 142e5cf36d9f2047f24341883bd564b1d5170370

More specifically, 142e5cf36d9f2047f24341883bd564b1d5170370 is not just a random id, it’s a SHA-1 hash.

But, what exactly has been hashed here?

Instead of spoiling the answer, let’s use another built-in git command: git cat-file. It basically allows you to have a look at something which git stores somewhere in your repository’s .git folder, given that you happen to know its SHA1-hash. Sounds useful, right?

Execute the following command (and make sure to try this with the SHA1 hash that you are getting for your commit)

# make sure to change the SHA1-hash!
git cat-file -p 142e5cf36d9f2047f24341883bd564b1d5170370

(Note: The -p option makes sure to pretty-print its output.)

You’ll get output similar to this:

# git cat-file's output
tree c4548e069652a6825894699ef7740a620ea0a6a8
parent 715247c8426d3c16881539118e1eafeb38439b1c
author Marco Behler <marco@marcobehler.com> 1641459065 +0100
committer Marco Behler <marco@marcobehler.com> 1641459065 +0100

Updated README

Tada! This is what a commit looks like in Git. It’s a text file with…​6 lines (well 5, and an empty one to delimit your commit message from the rest). Yes, really.

And if you put those lines into a sha1sum(), function you’ll end up with your SHA1 hash : 142e5cf36d9f2047f24341883bd564b1d5170370!

For the advanced reader, Git doesn't exactly do sha1sum(filecontent), it actually does a sha1sum(header + filecontent) - but we'll cover this in a bit.

Now, some of those lines from your commit (file) you’ll be familiar with:

# who committed the file?
committer Marco Behler <marco@marcobehler.com> 1641459065 +0100

# what's the commit message?
Updated README

Whereas some other parts of the commit probably look unfamiliar:

tree c4548e069652a6825894699ef7740a620ea0a6a8
parent 715247c8426d3c16881539118e1eafeb38439b1c

Let’s (rightly) assume for now that parent(s) simply references the commit that came before the current commit. Then, what does the tree line stand for? Execute another git cat-file to find out!

# make sure to change the SHA1-hash to that of your tree!
git cat-file -p c4548e069652a6825894699ef7740a620ea0a6a8

Look, this tree seems to be yet another text file, referencing (snapshots of) all the files in your repository at the time of the commit!

100644 blob ddd3b7b6335a636af9a9241096455e834f12f636    LICENSE.txt
100644 blob 773fc76fe191ceff24259d4e66efc90e86093b0c    README.txt

Can this be true? Well, you’ll find out by doing one last git cat-file, this time using README.txt’s hash.

git cat-file -p 773fc76fe191ceff24259d4e66efc90e86093b0c

Which leads to the following output:

"a git guide"

Does this look familiar? Yes, it is a snapshot of your README.txt file, at the time of the second commit, i.e. when you updated the readme. Which means that it does look like Git stores the full file contents for every commit (assuming the contents have changed)?

Well, to be sure, let’s repeat the git cat-file game for the first commit (which serves as a great exercise, so refer back to the git log output and repeat the steps!). You’ll end up with something like this:

# cat'ing README.txt snapshotted during the first commit
git cat-file -p fe066d3f7568e13ef031b495e35c94be91b6366c

"a marcobehler.com guide"

Take-Away: Git doesn’t store deltas between commits, it always stores snapshots, i.e. the full file, for every commit (as long as the file changed and its SHA1-hash is not already in your repository).

This is also the reason why Git is not a great choice for projects with (mostly) many binary assets, that frequently change.

What others are saying

Share

Comments

let mut author = ?

I'm @MarcoBehler and I share everything I know about making awesome software through my guides, screencasts, talks and courses.

Follow me on Twitter to find out what I'm currently working on.