Background
In the first post in this series, I’ve claimed that the generation of object IDs in Git is the SHA-1 hash of the string
<object type name> SP <len> NUL <data>
,where
<data>
stands for the output ofgit cat-file -p {hash}
<len>
means the length of<data>
. It can be measured with the commandwc -c
.
Problem
To verify my claim, I followed the steps in Chapter 4 of Version Control with Git.
- Create a folder named
hello
and go to that directory. - Initialise an empty Git repository.
- Create the file
hello.txt
with one single line “hello world”. - Add the file to Git’s object storage.
- Get a tree object from the index.
- Capture the contents of the tree object in
test.txt
. - Count the number of bytes in
test.txt
. -
Create the file
len.txt
consisting of"tree" SP <result in item 6> NUL
without the line terminator.
- Concatenate the contents of the files
len.txt
andtest.txt
and compute its SHA-1 hash.
$ mkdir hello && cd hello
$ git init
$ echo "hello world" > hello.txt
$ git add hello.txt
$ git write-tree
68aba62e560c0ebc3396e8ae9335232cd93a3f60
$ git cat-file -p 68aba62e560c0ebc3396e8ae9335232cd93a3f60 | tee test.txt
100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad hello.txt
$ wc -c test.txt
63 test.txt
$ printf "tree 63\0" > len.txt
$ cat len.txt test.txt | shasum
10bd0f0350027c25edc4ce72aba60e641f55596d -
As can be seen above, I’ve a wrong SHA-1 hash. How can I get back the right SHA-1 hash?
Method
I googled “git tree hash id”, and I found the chosen answer of
this Stack Overflow question very explanative. One may
shorten the included command by replacing echo -en
with printf
.
Get the object size right
In this case, the object size of the tree
68aba62e560c0ebc3396e8ae9335232cd93a3f60
containg the blob
3b18e512dba79e4c8300dd08aeb37f8e728b8dad
which corresponds to the
file hello.txt
should be 37 (= 6 + 1 + 9 + 1 + 20).
Get the object content right
As the blob ID is stored as binary value, I copied it and pasted it
Vim so that I could easily insert \x
in front of a pair of hex
digits in the blob ID. I then put the things together in a command.
# Contents of `testing.sh' as seen inside Vim
# Note that there's NO newline character in the following command
printf "tree 37\x00100644 hello.txt\x00\x3b\x18\xe5\x12\xdb\xa7\x9e\x4
c\x83\x00\xdd\x08\xae\xb3\x7f\x8e\x72\x8b\x8d\xad" | shasum
Result
By executing the above command, I got the right SHA-1 hash:
68aba62e560c0ebc3396e8ae9335232cd93a3f60
.
Fact learnt: formatting printf’s output
In the Stack Overflow question, there’s a command
find .git/objects/ -type f -printf "%h%f %s\n"
- The flag
-type f
stands for files. Without this flag, directories like.git/objects
will be displayed. - The flag
-printf
formats the output.%h
means the head of the file name without the last component of the file name. Thus, it expands to a the path of a directory without the trailing/
.%f
means the last component of the file name. As a result, the/
inside the displayed SHA-1 hashes are taken away.%s
means the file size