Git Spelunking with Bisect
Today, I continued on the git spelunking that prompted the post from yesterday. I’ve been diving deeper in the Erlang codebase today, and so I needed some new tools.
git bisect
git-bisect
lets you search for a commit using binary search. I wanted to find the last commit that contained a particular Erlang function in the otp
codebase. My process looked something like this:Thank you to Yannick for helping me learn how to use this.You can read more about git-bisect
here.
First I start the bisect
git bisect start
Then I specify a commit that didn’t have the function, in this case HEAD:
git bisect bad
Then I specify a commit that does have the function. In this case I knew that the function was in OTP_R14A. git-bisect
lets me specify a tag here:
git bisect good OTP_R14A
At this point git-bisect
took me automatically to a commit that was roughly halfway between those two commits. While I’m here, I can run whatever checks I want to on the command line:
rg '<function-to-find>'
If the function is there, I mark git bisect good
. If it’s not, then I mark git bisect bad
. Either way, git-bisect
moves me to the next pivot commit in the search. otp
has tens of thousands of commits, but binary search meant that it took around 15 steps to get the commit I wanted.
Once I was done, I could run:
git bisect reset
To get back to where I started.
It’s great to know that I can run whatever arbitrary manual commands I need at each manual step, but in this case I was running the same command each time.
In this case, that’s where git bisect run
comes in:
git bisect run
git bisect run
lets you run a script to check for good/bad
at each step of the binary search, instead of having to do it manually.Thank you to Mikkel for suggesting this to me.You can read more about git-bisect run
here.
The initial suggestion I got was to use git grep
as in:
git bisect start
git bisect bad
git bisect good "[tag]"
git bisect run git grep "[function name]"
But I knew that there were two versions of is_system_process/1
at various times in the Erlang codebase. I only wanted to know about the older one,
so I used this modified version instead:
git bisect start
git bisect bad
git bisect good OTP_R14A
git bisect run git grep "[function name]" "path/to/folder/*"
Where path/to/folder/*
was pointing the folder I knew the function was in.I initially tried to use the exact filename, but in failing commits the file didn’t exist throwing an error that git bisect run
didn’t know how to deal with. The glob seemed to avoid this problem.
With this, the binary search took a few seconds and quickly returned the same commit that I found manually.
Now that I had the commit in hand, I wanted to know what the next tagged release that contained that commit was. Enter git-describe
:
git describe
git-describe
is purpose built for finding tags from commits. By default, it finds the tag that immediately predates the commit. But if you use the --contains
option, it will find the tag that “contains” the commit, that is the commit I want to find.Credit to Stack Overflow for helping me find this one.
If the tag is exactly the commit, then it will only return the tag. Otherwise, it will have a suffix that shows:Source where you can read more about git-describe
.
the number of additional commits on top of the tagged object and the abbreviated object name of the most recent commit.
I can use sed
to strip out the suffix, since I only want the tag name:
git describe --contains "<commit>" | sed 's/~.*//'
Another way to accomplish this is with git tag --contains
:Thanks to Miccah for this suggestion.Read more about git-tag
here.
git tag --contains "<commit>" --sort=creatordate
Which will return all of the tags that contain the commit, sorted by their creation date.
Now I have the tag I want, but when was it created? I found a couple techniques that work:
Technique 1: git log -1
git log
will display information about the parents of a given commit. But if you use the -1
option, it will limit it to 1
commit, only the one we pass to it. And we can pass a tag to it:Credit to Stack Overflow for this one too.Read more about git-log
here.
git log -1 "<tag>"
This will show the default information about the commit behind the tag.
If I only wanted the date, I could use:
git log -1 --format=%ai "<tag>"
Technique 2: git for-each-ref
git for-each-ref
will iterate over all refs that match a given pattern. It also lets you format information from that ref.I found information on this one from two Stack Overflow answers: one and two. The latter from a comment on the accepted answer.Read more about git for-each-ref
here.
That pattern can be as specific as a single tag:
git for-each-ref \
--format="%(refname:short) | %(creatordate)" \
"refs/tags/OTP_17.0-rc1"
I like this one because I can easy generalize a solution to ask other questions. When was every tag released?
git for-each-ref \
--format="%(refname:short) | %(creatordate)" \
"refs/tags"
By default, it seems to sort alphabetically by refname
. If I want them in chronological order, I can add --sort=creatordate
on the end.
Takeaways
I came out of this expedition with a rich collection of tools that I can use in future spelunking. All of these tools are built into git
and have excellent documentation in the reference manual.
I often read source code in order to get a better understanding of the tools, libraries, and software that I use. These git
tools allow me to explore that source code in specific historical context, and understand how codebases evolve and change over time.
Bonus: A Simpler Search Method
When I asked about how to find the commit I needed in a Recursers chat, I got more answers after I had already found the commit I was looking for. Here is one that was simpler than git-bisect
:Thanks to Nathan and Benjamin for this suggestion.
git log -S
You can use git log -S "<name of function>"
to find commits that touch the string <name of function>
. I found this didn’t take much longer than git bisect run
on the Erlang codebase. And, I didn’t have to find an early “good” commit before searching.