Blog 1

Random Talk on Random Thoughts

Git Object ID Generation (4): General Trees

| Comments |


After I’ve written the third post in this series, I believed that I could generate the SHA-1 hash of all Git objects.


In order to understand the object ID of an arbitrary tree object, it is necessary that I create a file in a sub-folder. Suppose that I copied the file hello.txt to the sub-directory subdir in the directory hello in the second post In short, I just followed the steps in Chapter 4 of Version Control with Git. I include the setup here for convenience.

$ mkdir hello && cd hello
$ git init
$ echo "hello world" > hello.txt
$ git add hello.txt
$ git write-tree
$ mkdir subdir
$ cp hello.txt subdir
$ git add subdir/hello.txt
$ git write-tree
$ git cat-file -p 492413269336d21fac079d4a4672e55d5d2147ac
100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad	hello.txt
040000 tree 68aba62e560c0ebc3396e8ae9335232cd93a3f60	subdir

After having successfully generated the SHA-1 hash for the tree object 68aba62e560c0ebc3396e8ae9335232cd93a3f60 in the second post in this series, I tried the same task for another tree object 492413269336d21fac079d4a4672e55d5d2147ac. Using the same technique describe in the previous post, I got another wrong SHA-1 hash 06eb95bda67a8f86e65bb1590744f10a61eeccef.

# Note: I *didn't* type enter in the following command.  Just keep typing.
$ printf "tree 71\x00100644 hello.txt\x00\x3b\x18\xe5\x12\xdb\xa7\x9e\x4c\x83\x0
0\xdd\x08\xae\xb3\x7f\x8e\x72\x8b\x8d\xad040000 subdir\x00\x68\xab\xa6\x2e\x56\x
06eb95bda67a8f86e65bb1590744f10a61eeccef  -

How to get the right object ID?

Cause of error

I realised that I had misunderstood the structure of a tree object again after reading the first Google search result of “git tree object format”. It’s a Stack Overflow question on the “format of Git tree object”. I read the largest code block of the first answer, and find out that the leftmost zero digit in 040000 should be taken away.

A primitive method

Get the object size with the wc command

$ printf "100644 hello.txt\x00\x3b\x18\xe5\x12\xdb\xa7\x9e\x4c\x83\x00\xdd\x08\x
ae\xb3\x7f\x8e\x72\x8b\x8d\xad40000 subdir\x00\x68\xab\xa6\x2e\x56\x0c\x0e\xbc\x
33\x96\xe8\xae\x93\x35\x23\x2c\xd9\x3a\x3f\x60" | wc -c

SHA-1 hash

$ printf "tree 70\x00100644 hello.txt\x00\x3b\x18\xe5\x12\xdb\xa7\x9e\x4c\x83\x0
0\xdd\x08\xae\xb3\x7f\x8e\x72\x8b\x8d\xad40000 subdir\x00\x68\xab\xa6\x2e\x56\x0
c\x0e\xbc\x33\x96\xe8\xae\x93\x35\x23\x2c\xd9\x3a\x3f\x60" | shasum
492413269336d21fac079d4a4672e55d5d2147ac  -

A more time-saving method

The above printf command is error-prone. Here’re some less laborious commands.

Understand the tree object

While searching for the cause of error, I jumped through many web pages, and I went back to the Stack Overflow question stated in the second post in this series.

$ git cat-file tree 492413269336d21fac079d4a4672e55d5d2147ac | od -c
0000000   1   0   0   6   4   4       h   e   l   l   o   .   t   x   t
0000020  \0   ; 030 345 022 333 247 236   L 203  \0 335  \b 256 263 177
0000040 216   r 213 215 255   4   0   0   0   0       s   u   b   d   i
0000060   r  \0   h 253 246   .   V  \f 016 274   3 226 350 256 223   5
0000100   #   , 331   :   ?   `
  • The c flag: show the input as characters if possible, otherwise as octal 1-byte units.
  • The b flag: show the input as octal 1-byte units.

Note that the number of bytes can be found at the bottom left hand corner. This is actually the object size of the tree object 492413269336d21fac079d4a4672e55d5d2147ac.

One can capture the binary output and dump it to od with one command.

$ git cat-file tree 4924132 | tee test.txt | od -c

An improved printf command

$ printf "tree 70\0" > len.txt
$ cat len.txt test.txt | shasum
492413269336d21fac079d4a4672e55d5d2147ac  -

We finally get the target object ID in three steps.

Facts learnt

Another use of git cat-file

$ git cat-file -s 492413269336d21fac079d4a4672e55d5d2147ac
  • The -s flag: size
  • The -p flag: pretty-print

From the word “pretty” in the man page for git-cat-file, I understand why I had misunderstood the structure of Git tree objects.

Use od like hd

From a comment to the second answer to the Stack Overflow question about the “format of git tree object”, I saw the word hexdump, and I viewed its man page. At first, I didn’t know their difference, so I googled “od vs hexdump”, and then I saw the abbreviation hd for hexdump, so I changed the search query string to “hd vs od”, but found out that their functions are basically the same, but their display is different by default. I like the default display of hd. To use od like hd, one only needs to copy the command from the man page of od.