Search code examples
gitdirectoryproject-managementorganization

Submodule libraries in git to minimize redundancy


I'm very new to using git, and previously haven't really tried to "organize" any projects I've worked on. I just recently purchased a development server for personal use, however, and I wanted to start organizing all my projects and using version control.

I've spent the past 8 hours researching different recommended methods for organizing files in a project, and I realize that it's a very subjective matter. However I've developed a system that I think will work for just about any cause for me and I have one very objective question in regards to how to accomplish a certain task with the directory structure.

Presently I'm looking into a structure akin to the following:

src/ - All deliverables in an uncompiled form (PHP files, c source files, etc)
data/ - Crucial but unrelated data (SQL databases, etc.)
lib/ - Dependencies -- THIS IS WHERE MY QUESTION LIES
docs/ - Documentation
build/ - Scripts to aide in the build process
test/ - Unit tests
res/ - Not version controlled. Contains PSD files and non-diff-able stuff
.gitignore
README
output.zip - Ready-to-install finished product (just unzip and go)

As I mentioned - my real issue revolves around this lib/ directory. This needs to contain all files and programs which my project requires to run, but which are outside of the scope of my project and I won't be editing. Some features that I need this folder to have:

  • Since these are needed for my final product to run, they must be included in output.zip
  • I would like this folder to be version controlled so that anyone who downloads my git repository will have access to all dependencies
  • If several projects have the same dependency, I do NOT want to have 18 redundant copies of the same file on my server
  • I would like to be able to pull these dependencies from other projects of mine (one project should be able to serve as a library for a separate project)

I can avoid having 18 redundant copies of the same file by using a virtual directory (symlink), however from my understanding git would copy this symlink as-is into the repository without copying the files. Therefore if anyone else fetched my repository they would have a dangling pointer and no libraries.

At first it looked like I could do what I wanted using git-submodule. However from my understanding this takes the entire contents of another repository and treats it as a sub-directory. Therefore if I included "dependency A" my libraries folder would look something like:

/lib/A/src/
/lib/A/data/
...
/lib/A/test/
.gitignore
README
output.zip

In the case of a script (PHP, Perl, etc.) I could probably load the dependency using require('lib/A/src/dependency.php'), but in the case of a DLL or binary file I would have no easy way to read the output file from output.zip. I could have the finished project stored directly at the root level instead of wrapped up in a pretty zip file, but if the project were, say, a website - this could mean hundreds of files cluttering up my repository root.

How can I include another repository as a library of my own, easily reference the library files within my own project, have the library meaningfully copied to anyone who fetches my repository, and prevent redundant copies of the same files on my development server?

EDIT: After searching on Google for a while I found this similar issue, however it only addresses PHP projects. While an autoloader may allow you to mask the underlying file system in a PHP environment, how would you apply a similar approach to a C++ project? Or a Python project? Or a Java project?

As I thought more about this project today a few other thoughts came to mind which may require a new direction of thought. First is the problem of very deep library nests. If project A depends on project B which depends on project C which depends on project D then you would have a directory structure like so:

A/lib/
A/lib/B/
A/lib/B/lib/
A/lib/B/lib/C/
A/lib/B/lib/C/lib/
A/lib/B/lib/C/lib/D/

Obviously this would not only get annoying, but redundant in its own way.

How do normal people deal with dependencies when doing a git repository?


Solution

  • In the projects that I have been on, submodules are good only for certain cases when it comes to dependency management, in other cases this is complemented by other framework. Mostly, I prefer to use submodules when I need the complete repository, ex- in case I have a common build script that I can share across projects.

    There are specific tools focusing on dependency management in various stack -

    etc.

    These tools take care of the redundancy management.

    Currently, I am on a .net project, where we have this setup -

    1. Powershell build scripts shared across projects using submodules. Buildscript repository contains all 3rd party executables required to deploy any of our .net applications and the respective wrapper powershell scripts, plus some scripts to load the conventions, config etc.
    2. Nuget server (via Teamcity) hosting nuget packages for common binaries shared across projects. Nuget Package restore is a feature that allows fetching packages as part of the build.