Several tutorials ([1], [2], [3]) that can be found on the internet suggest the following configuration for diffing Word documents tracked by git
.
Configure a "pandoc" diff driver with the following settings:
[diff "pandoc"]
textconv=pandoc --to=markdown
prompt = false
Add the following to your .gitattributes
file:
*.docx diff=pandoc
This seems to work fine except when trying to diff an untracked Word document after indicating intent to add it to the git repository. Does anyone know why this isn't working in this case?
Here are the steps to reproduce, assuming the following configuration detailed above.
Create a Word document in a git repository
touch my_document.docx
Open the file in Microsoft Word, add some content to the Word document (e.g, the characters "asdf"), and save it
Indicate your intent to add the document
git add -N my_document.docx
Try to see the diff:
git diff my_document.docx
#> couldn't parse docx file
#> fatal: unable to read files to diff
With git
version 2.17.1 on macOS, I end up with a "fatal: unable to read files to diff" error. However, just adding the file to the index and then running git diff --cached
results in the following diff:
diff --git a/my_document.docx b/my_document.docx
new file mode 100644
index 0000000..17f1b0d
--- /dev/null
+++ b/my_document.docx
@@ -0,0 +1 @@
+asdf
Why doesn't the diff driver work with git add -N
?
This is ultimately due to the fact that pandoc --to=markdown /dev/null/
correctly returns nothing without erroring out, whereas pandoc --to=markdown a/my_document.docx
errors out just in case a/my_document.docx
is an empty file.
So in the case where you've added my_document.docx
to the index for the first time and then run git diff --cached
to compare the index to HEAD, the comparison will be against /dev/null
, and everything will work just fine.
However, in the case where you've indicated your intent to add a new file, my_document.docx
, with git add -N
, an empty file with the same name will be added to the index. In this case, pandoc
will error out when trying to convert the empty file in the index to Markdown.