# create a git repo in a directory of your liking mkdir gitinternals cd gitinternals git init -b main ## add two .txt files and commit them echo "-TODO-" > LICENSE.txt echo "a marcobehler.com guide" > README.txt git add LICENSE.txt git add README.txt git commit -m "Project Setup" ## update README.txt's contents echo "a git guide" > README.txt git add README.txt git commit -m "Updated README"
Git: Merge, Cherry-Pick & Rebase
An unconventional guide
(Buy now if you're already convinced!)
You can use this guide to get a deep understanding of how Git's merges, rebases & cherry-picks work under the hood, so that you'll never fear them again.
(Editor’s note: At ~5500 words, you probably don’t want to try reading this on a mobile device. Bookmark it and come back later. And even on a desktop, eat read this elephant one bite at a time.)
Sure, everyone and their grandmother use Git and seems to be comfortable with it.
But did you ever botch a merge and then your solution was to delete and re-clone your repository? Without quite knowing what went wrong and why?
Or did a rebase suddenly make tens of merge conflicts pop up, one after another and you didn’t know what the hell was going on?
In short, do you have nagging doubts, whenever it comes to merging, rebasing and cherry-picking?
Fear not, you’ve come to the right place: The remainder of this guide will help you get rid of those fears.
(Teaser: By the end of this article, you’ll understand that a
git cherry-pick is essentially just a
git merge. And a
git rebase is essentially just a
git cherry-pick? Sounds crazy? Read on!)
Git Storage Internals
Before you jump right into the nitty-gritty details of merging, let’s have a look at how Git stores your files and commits.
It might seem a bit weird to start off with internal details, but take a leap of faith: Those internals are the building block for everything else in this guide, so you’ll need to know them first.
Scenario: Committing Two Files
Open up your terminal and execute the following commands.
You created two .txt files in a first commit, then updated the contents of one file (
README.txt) in a second commit.
Here’s a question for you: How do you think Git will store those two commits, or rather the two versions of
Will it store full files, i.e.
a marcobehler.com guideAND
a git guide, somewhere?
Will it store deltas, something like
a (-marcobehler.com)(+git) guide(pseudo-code)?
Bonus question: How the hell would the answer to this help with merging or rebasing?
Let’s find out!
Inspecting Git repos: 'git cat-file'
Let’s execute a
git log in your repository, and you’ll get output similar to this:
# in your repository's directory git log # Project Setup commit 142e5cf36d9f2047f24341883bd564b1d5170370 (HEAD -> main) Author: Marco Behler <firstname.lastname@example.org> Date: Tue Dec 28 09:54:44 2021 +0100 Updated README commit 715247c8426d3c16881539118e1eafeb38439b1c Author: Marco Behler <email@example.com> Date: Tue Dec 28 09:54:25 2021 +0100 Project Setup
So far, nothing surprising - you’ll see your two commits. Something that you’ve seen, but probably ignored plenty of times are
commit ids. Here’s the second commit’s id.
142e5cf36d9f2047f24341883bd564b1d5170370 is not just a random id, it’s a SHA-1 hash.
But, what exactly has been hashed here?
Instead of spoiling the answer, let’s use another built-in git command:
git cat-file. It basically allows you to have a look at something which git stores somewhere in your repository’s
.git folder, given that you happen to know its SHA1-hash. Sounds useful, right?
Execute the following command (and make sure to try this with the SHA1 hash that you are getting for your commit)
# make sure to change the SHA1-hash! git cat-file -p 142e5cf36d9f2047f24341883bd564b1d5170370
-p option makes sure to pretty-print its output.)
You’ll get output similar to this:
# git cat-file's output tree c4548e069652a6825894699ef7740a620ea0a6a8 parent 715247c8426d3c16881539118e1eafeb38439b1c author Marco Behler <firstname.lastname@example.org> 1641459065 +0100 committer Marco Behler <email@example.com> 1641459065 +0100 Updated README
Tada! This is what a commit looks like in Git. It’s a text file with…6 lines (well 5, and an empty one to delimit your commit message from the rest). Yes, really.
And if you put those lines into a
sha1sum(), function you’ll end up with your SHA1 hash :
Now, some of those lines from your commit (file) you’ll be familiar with:
# who committed the file? committer Marco Behler <firstname.lastname@example.org> 1641459065 +0100 # what's the commit message? Updated README
Whereas some other parts of the commit probably look unfamiliar:
tree c4548e069652a6825894699ef7740a620ea0a6a8 parent 715247c8426d3c16881539118e1eafeb38439b1c
Let’s (rightly) assume for now that
parent(s) simply references the commit that came before the current commit. Then, what does the
tree line stand for? Execute another
git cat-file to find out!
# make sure to change the SHA1-hash to that of your tree! git cat-file -p c4548e069652a6825894699ef7740a620ea0a6a8
Look, this tree seems to be yet another text file, referencing (snapshots of) all the files in your repository at the time of the commit!
100644 blob ddd3b7b6335a636af9a9241096455e834f12f636 LICENSE.txt 100644 blob 773fc76fe191ceff24259d4e66efc90e86093b0c README.txt
Can this be true? Well, you’ll find out by doing one last
git cat-file, this time using
git cat-file -p 773fc76fe191ceff24259d4e66efc90e86093b0c
Which leads to the following output:
"a git guide"
Does this look familiar? Yes, it is a snapshot of your
README.txt file, at the time of the second commit, i.e. when you updated the readme. Which means that it does look like Git stores the full file contents for every commit (assuming the contents have changed)?
Well, to be sure, let’s repeat the
git cat-file game for the first commit (which serves as a great exercise, so refer back to the
git log output and repeat the steps!). You’ll end up with something like this:
# cat'ing README.txt snapshotted during the first commit git cat-file -p fe066d3f7568e13ef031b495e35c94be91b6366c "a marcobehler.com guide"
Take-Away: Git doesn’t store deltas between commits, it always stores snapshots, i.e. the full file, for every commit (as long as the file changed and its SHA1-hash is not already in your repository).
What others are saying
If you want to step up your skills in the Git game, I couldn't recommand you more @MarcoBehler 's guide: https://t.co/XnQIxcoIw5— Christian German (@christiandev35) January 25, 2022
What's happen under the hood, explained with clear, comprehensive examples, helps a lot.
From time to time the GIT "magic" used to "stop working" and create an embarrassing mess. Thanks to the new course from @MarcoBehler , I finally filled the gap and now fully understand what is going on behind the scenes.— Zoran Bogatinoski (@ZBogatinoski) January 17, 2022
Keep up the good work.👋
I definitely learned a thing or two! https://t.co/Jx6EanRZYZ— Lukas Eder (@lukaseder) January 14, 2022
Learning git almost only by trial and error, I was always under the impression that the git internals are too complex 🤯@MarcoBehler's new git guide provides dead-simple explanations for what seems to be complicated (merging, rebasing, cherry-pick) 💯https://t.co/N9zzjGiy5z— Philip Riecks (@rieckpil) January 28, 2022
In our team we use rebase, squash and cherry-picking squashed commits on a daily basis.— Michael Simons (@rotnroll666) January 17, 2022
Thanks to @MarcoBehler I learned in stunning detail what's behind the scene… (Very much graph related, btw).
You can do too with Marcos unconvential guide https://t.co/j67Vhk2sZE
I bought and read this "Git Merge & Rebase: An unconventional guide" by @MarcoBehler and it's mind blowing.— Siva (@sivalabs) January 14, 2022
If u r like me who restricted ur git usage to clone/commit/push/pull because u don't want to waste ur time because of weird git issues, then you should definitely read it. https://t.co/a5nY99RP33
Reading @MarcoBehler's new #git guide managed to fill in more than a few blanks in my understanding of how git actually works. Starting with the fact that git doesn't store diffs...💡https://t.co/xtLLY4HnpD— Andreas Eisele (@ae____) January 14, 2022