Search code examples
gitgraphdiagram

Create graph from a GIT repo


How could I obtain the necessary information to create graphs attached in this post that would essentially give me the following visual.. COMMIT ID -> TREE ID -> BLOBS

GIT Graph Img 1

Here's another one..

Img 2


Solution

  • There are two parts to this problem. One is very easy: getting the graph edges and vertices out of Git. The other is very hard: drawing a "pretty" (planar, with minimal line crossings) diagram.

    You seem to be asking about the easy part, so here's the answer: use git cat-file -p to obtain the contents of each Git object, starting with some known Git hash ID or IDs. (Use git rev-parse to obtain the initial IDs.)

    For example:

    $ git rev-parse HEAD
    d35688db19c9ea97e9e2ce751dc7b47aee21636b
    $ git cat-file -p HEAD
    tree 242af4b1a902347da2ff144516fb40c4a28ca257
    parent 43c9e7e365d7a8961767d0bd4a305ca378800a2a
    author Junio C Hamano <gitster@pobox.com> 1507361343 +0900
    committer Junio C Hamano <gitster@pobox.com> 1507361343 +0900
    
    Prepare for -rc1
    
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    

    (the git cat-file example here is meant to show that you can use either a symbolic name, or a hash ID, to get the process started). Examining a commit object will get you exactly one tree line, and zero or more parent lines, providing the hash IDs for the parent edges (note that this is a DAG and these are outgoing arcs, if you care to draw arrow heads on your edges).

    A tree object has a relatively rigid internal form, which again can be viewed through git cat-file -p:

    $ git cat-file -p 242af4b1a902347da2ff144516fb40c4a28ca257
    100644 blob 611ab4750bd21e77d0fec41c8b2e115574c692ff    .clang-format
    100644 blob 8ce9c6b8888fe6c12949d30e3e8b461cb67bb43f    .gitattributes
    040000 tree 7ba15927519648dbc42b15e61739cbf5aeebf48b    .github
    100644 blob 833ef3b0b783b8180d0dad1ce336713bddf09b26    .gitignore
    100644 blob cbeebdab7a5e2c6afec338c3534930f569c90f63    .gitmodules
    100644 blob ab85e0d16d6383b13954220a0b41202bd68d5d73    .mailmap
    100644 blob fead995eddd15460b6be81e6a5f7c8f0648368ca    .travis.yml
    100644 blob 8c85014a0a936892f6832c68e3db646b6f9d2ea2    .tsan-suppressions
    100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42    COPYING
    040000 tree 3957dfa63966e1efd20481ebd61311397a34e8ab    Documentation
    100755 blob ab04c977be0cfdb6f282b7911d3fe630d5f70c65    GIT-VERSION-GEN
    100644 blob ffb071e9f03a79a052beaa4372fa790ecbabbb7b    INSTALL
    [more, snipped]
    

    Each output line begins with a "mode", which is 040000 if the object with this name is itself another tree, or one of 100644 or 100755 if it is an ordinary file. (There are two more modes, one for symbolic links, and one for a "gitlink", which is how Git stores the submodule hash ID for submodules. See also https://github.com/chris3torek/scripts/blob/master/githash.py for instance.) Following the encoded mode, git cat-file -p prints the underlying Git object type, then the hash ID, then a tab, and then the file name component under which the blob or sub-tree is to be extracted.

    Each hash ID is unique, so if a hash ID occurs more than once, you have a shared sub-node. This is the case for several of the blob objects in your example graphs. Note that a top-level tree can be re-used as well. For instance, if you have this commit series:

    A <-B <-C   <--master
    

    where commit C is is made by a git revert of commit B, it's very likely that A and C use the same top-level tree (which automatically means they use all the same sub-trees and blobs).