Search code examples
gitgraphnetworkxgit-log

Extract git graph for processing


Given a Github repository, I need to extract the graph representing its commits, branches etc. so that I can process it with scripts.
I know that once cloned the repository I can use the log command like:

git log --graph --abbrev-commit --decorate --date=relative --all  

but its output cannot be processed (or at least easily).
After many useless attempts, I found out this tool (git-dot) that generates a .dot file representing the graph of the given repository; then it has been easy to work with the graph since I have been able to import it reading the .dot file in Networkx. However, I think that such tool doesn't work very well as I have less commits than the number written in the Github repository, too many cycles and so on.

My question is about other tools or a representation of log command giving me a graph that I can process with my scripts. I hope you can help me.


Solution

  • git rev-list --all --parents will give you the raw data, you can annotate it however you want. Git ancestry graphs don't have cycles.

    Here's the basics of what that tool you found has to be doing:

    git rev-list --all --parents \
    | awk ' BEGIN{print "strict digraph git {"}
            NF==1 {print "\""$1"\";"}
            NF>1 { for (n=2; n<=NF; ++n) print "\""$1"\" -> \""$n"\";" }
            END{print "}"}' \
    | dot -Tpng -otest.png