Git Spelunking with Bisect
Today, I continued on the git spelunking that prompted the post from yesterday. I’ve been diving deeper in the Erlang codebase today, and so I needed some new tools.
git-bisect lets you search for a commit using binary search. I wanted to find the last commit that contained a particular Erlang function in the
otp codebase. My process looked something like this:Thank you to Yannick for helping me learn how to use this.You can read more about
First I start the bisect
git bisect start
Then I specify a commit that didn’t have the function, in this case HEAD:
git bisect bad
Then I specify a commit that does have the function. In this case I knew that the function was in OTP_R14A.
git-bisect lets me specify a tag here:
git bisect good OTP_R14A
At this point
git-bisect took me automatically to a commit that was roughly halfway between those two commits. While I’m here, I can run whatever checks I want to on the command line:
If the function is there, I mark
git bisect good. If it’s not, then I mark
git bisect bad. Either way,
git-bisect moves me to the next pivot commit in the search.
otp has tens of thousands of commits, but binary search meant that it took around 15 steps to get the commit I wanted.
Once I was done, I could run:
git bisect reset
To get back to where I started.
It’s great to know that I can run whatever arbitrary manual commands I need at each manual step, but in this case I was running the same command each time.
In this case, that’s where
git bisect run comes in:
git bisect run
git bisect run lets you run a script to check for
good/bad at each step of the binary search, instead of having to do it manually.Thank you to Mikkel for suggesting this to me.You can read more about
git-bisect run here.
The initial suggestion I got was to use
git grep as in:
git bisect start git bisect bad git bisect good "[tag]" git bisect run git grep "[function name]"
But I knew that there were two versions of
is_system_process/1 at various times in the Erlang codebase. I only wanted to know about the older one,
so I used this modified version instead:
git bisect start git bisect bad git bisect good OTP_R14A git bisect run git grep "[function name]" "path/to/folder/*"
path/to/folder/* was pointing the folder I knew the function was in.I initially tried to use the exact filename, but in failing commits the file didn’t exist throwing an error that
git bisect run didn’t know how to deal with. The glob seemed to avoid this problem.
With this, the binary search took a few seconds and quickly returned the same commit that I found manually.
Now that I had the commit in hand, I wanted to know what the next tagged release that contained that commit was. Enter
git-describe is purpose built for finding tags from commits. By default, it finds the tag that immediately predates the commit. But if you use the
--contains option, it will find the tag that “contains” the commit, that is the commit I want to find.Credit to Stack Overflow for helping me find this one.
If the tag is exactly the commit, then it will only return the tag. Otherwise, it will have a suffix that shows:Source where you can read more about
the number of additional commits on top of the tagged object and the abbreviated object name of the most recent commit.
I can use
sed to strip out the suffix, since I only want the tag name:
git describe --contains "<commit>" | sed 's/~.*//'
Another way to accomplish this is with
git tag --contains:Thanks to Miccah for this suggestion.Read more about
git tag --contains "<commit>" --sort=creatordate
Which will return all of the tags that contain the commit, sorted by their creation date.
Now I have the tag I want, but when was it created? I found a couple techniques that work:
git log -1
git log will display information about the parents of a given commit. But if you use the
-1 option, it will limit it to
1 commit, only the one we pass to it. And we can pass a tag to it:Credit to Stack Overflow for this one too.Read more about
git log -1 "<tag>"
This will show the default information about the commit behind the tag.
If I only wanted the date, I could use:
git log -1 --format=%ai "<tag>"
git for-each-ref will iterate over all refs that match a given pattern. It also lets you format information from that ref.I found information on this one from two Stack Overflow answers: one and two. The latter from a comment on the accepted answer.Read more about
git for-each-ref here.
That pattern can be as specific as a single tag:
git for-each-ref \ --format="%(refname:short) | %(creatordate)" \ "refs/tags/OTP_17.0-rc1"
I like this one because I can easy generalize a solution to ask other questions. When was every tag released?
git for-each-ref \ --format="%(refname:short) | %(creatordate)" \ "refs/tags"
By default, it seems to sort alphabetically by
refname. If I want them in chronological order, I can add
--sort=creatordate on the end.
I came out of this expedition with a rich collection of tools that I can use in future spelunking. All of these tools are built into
git and have excellent documentation in the reference manual.
I often read source code in order to get a better understanding of the tools, libraries, and software that I use. These
git tools allow me to explore that source code in specific historical context, and understand how codebases evolve and change over time.
Bonus: A Simpler Search Method
When I asked about how to find the commit I needed in a Recursers chat, I got more answers after I had already found the commit I was looking for. Here is one that was simpler than
git-bisect:Thanks to Nathan and Benjamin for this suggestion.
git log -S
You can use
git log -S "<name of function>" to find commits that touch the string
<name of function>. I found this didn’t take much longer than
git bisect run on the Erlang codebase. And, I didn’t have to find an early “good” commit before searching.