Background
In the first post in this series, I’ve claimed that the generation of object IDs in Git is the SHA-1 hash of the string
<object type name> SP <len> NUL <data>
,where
<data>stands for the output ofgit cat-file -p {hash}<len>means the length of<data>. It can be measured with the commandwc -c.
Problem
To verify my claim, I followed the steps in Chapter 4 of Version Control with Git.
- Create a folder named
helloand go to that directory. - Initialise an empty Git repository.
- Create the file
hello.txtwith one single line “hello world”. - Add the file to Git’s object storage.
- Get a tree object from the index.
- Capture the contents of the tree object in
test.txt. - Count the number of bytes in
test.txt. -
Create the file
len.txtconsisting of"tree" SP <result in item 6> NULwithout the line terminator.
- Concatenate the contents of the files
len.txtandtest.txtand compute its SHA-1 hash.
$ mkdir hello && cd hello
$ git init
$ echo "hello world" > hello.txt
$ git add hello.txt
$ git write-tree
68aba62e560c0ebc3396e8ae9335232cd93a3f60
$ git cat-file -p 68aba62e560c0ebc3396e8ae9335232cd93a3f60 | tee test.txt
100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad hello.txt
$ wc -c test.txt
63 test.txt
$ printf "tree 63\0" > len.txt
$ cat len.txt test.txt | shasum
10bd0f0350027c25edc4ce72aba60e641f55596d -
As can be seen above, I’ve a wrong SHA-1 hash. How can I get back the right SHA-1 hash?
Method
I googled “git tree hash id”, and I found the chosen answer of
this Stack Overflow question very explanative. One may
shorten the included command by replacing echo -en with printf.
Get the object size right
In this case, the object size of the tree
68aba62e560c0ebc3396e8ae9335232cd93a3f60 containg the blob
3b18e512dba79e4c8300dd08aeb37f8e728b8dad which corresponds to the
file hello.txt should be 37 (= 6 + 1 + 9 + 1 + 20).
Get the object content right
As the blob ID is stored as binary value, I copied it and pasted it
Vim so that I could easily insert \x in front of a pair of hex
digits in the blob ID. I then put the things together in a command.
# Contents of `testing.sh' as seen inside Vim
# Note that there's NO newline character in the following command
printf "tree 37\x00100644 hello.txt\x00\x3b\x18\xe5\x12\xdb\xa7\x9e\x4
c\x83\x00\xdd\x08\xae\xb3\x7f\x8e\x72\x8b\x8d\xad" | shasum
Result
By executing the above command, I got the right SHA-1 hash:
68aba62e560c0ebc3396e8ae9335232cd93a3f60.
Fact learnt: formatting printf’s output
In the Stack Overflow question, there’s a command
find .git/objects/ -type f -printf "%h%f %s\n"
- The flag
-type fstands for files. Without this flag, directories like.git/objectswill be displayed. - The flag
-printfformats the output.%hmeans the head of the file name without the last component of the file name. Thus, it expands to a the path of a directory without the trailing/.%fmeans the last component of the file name. As a result, the/inside the displayed SHA-1 hashes are taken away.%smeans the file size