Search code examples
gitdiffpandocgit-diffgit-add

git diff not working with Word document, --intent-to-add, and pandoc diff driver


Several tutorials ([1], [2], [3]) that can be found on the internet suggest the following configuration for diffing Word documents tracked by git.

  1. Configure a "pandoc" diff driver with the following settings:

    [diff "pandoc"]
        textconv=pandoc --to=markdown
        prompt = false
    
  2. Add the following to your .gitattributes file:

    *.docx diff=pandoc
    

This seems to work fine except when trying to diff an untracked Word document after indicating intent to add it to the git repository. Does anyone know why this isn't working in this case?

Here are the steps to reproduce, assuming the following configuration detailed above.

  1. Create a Word document in a git repository

    touch my_document.docx
    
  2. Open the file in Microsoft Word, add some content to the Word document (e.g, the characters "asdf"), and save it

  3. Indicate your intent to add the document

    git add -N my_document.docx
    
  4. Try to see the diff:

    git diff my_document.docx
    #> couldn't parse docx file
    #> fatal: unable to read files to diff
    

With git version 2.17.1 on macOS, I end up with a "fatal: unable to read files to diff" error. However, just adding the file to the index and then running git diff --cached results in the following diff:

diff --git a/my_document.docx b/my_document.docx
new file mode 100644
index 0000000..17f1b0d
--- /dev/null
+++ b/my_document.docx
@@ -0,0 +1 @@
+asdf

Why doesn't the diff driver work with git add -N?


Solution

  • This is ultimately due to the fact that pandoc --to=markdown /dev/null/ correctly returns nothing without erroring out, whereas pandoc --to=markdown a/my_document.docx errors out just in case a/my_document.docx is an empty file.

    So in the case where you've added my_document.docx to the index for the first time and then run git diff --cached to compare the index to HEAD, the comparison will be against /dev/null, and everything will work just fine.

    However, in the case where you've indicated your intent to add a new file, my_document.docx, with git add -N, an empty file with the same name will be added to the index. In this case, pandoc will error out when trying to convert the empty file in the index to Markdown.