I would like to store Microsoft Visio 2013 diagrams on my Git repository. These diagrams are later converted into SVG and PDF for the software documentation built with Sphinx.
Unfortunately, the open-documents files are binary files (in fact they are ZIP archives) and Git does not like binary files much.
I realized that if I unzip my vsdx
file, I get plenty of xml
files which are more manageable using Git.
The issue is I need to hook some scripts to Git in order to only store the unzipped open-document files on the repository but keep the zipped version on the working directory. Is that something feasible and desirable to reduce the overall repository footprint?
The goal is that if I move a shape on my Visio diagram I don't want to almost duplicate my megabyte vsd file in my repository. I imagine the 2 megabyte XML file with one line change has a better chance to be compressed on Git Packfiles.
Is that correct?
If you're concerned about memory issues working with large Visio files, why not take advantage of git's distributed nature and set up multiple repositories? Like so.
Root Folder (Git Repo)
.gitignore (that ignores the Visio Folder)
Visio Folder (Also a Git Repo)
Work freely, committing your Visio files without concern. Then when you're happy with your changes, just move the intended file down a directory. Extract and commit it. This may seem inelegant, but if your Visio folder gets unworkable over memory concerns, you can just smash it, since everything you need is in the git repo below it. (The only real way to keep big binary files from hogging up space in the repo is to not commit them.)
If this solution is too crude, set your Visio folder as a remote for the root folder. Have your Visio folder contain two separate branches, one that contains commits that include your giant files and one that doesn't. Just fetch from the branch without the Visio files. If that still doesn't get you the control you need setup remotes, subfolders, etc. until you get a repository structure that can produce a meaningful work flow and history to you.
Adding a local remote
cd 'Root Folder'
git remote add visiofiles 'Visio Folder'
If you're feeling adventuresome, you could investigate git's "clean" and "smudge" filters (they get called on files when you commit and checkout--they're intended to let you use indentation rules that differ from your team, but you may be able to zip and unzip things). If you're extracting Visio files to be better able to inspect changes, you might text advantage of git's textconv configuration. Git let's you call custom diffs on files, and one method is to just convert that file to a string and run a diff on that. This does require you to be a bit comfortable with the .gitconfig and .gitattributes files and find a suitable program for the string conversion.
The problem you describe however, was a memory concern, so the hooks and configurations you have available might not be necessary.