Search code examples
gitgithubrepositoryclonecommit

Cloning Only Newly Uploaded Files In Git Commit


I am trying to download files from an online repository, mostly PDFs.

However, I only want to download the files of a specific commit. The total archive is over 1400 files, and the latest commit adds roughly 300 files to the total archive.

How do I clone only the newly uploaded 300 files from the repository?

Unlike other similar questions I have come across relating to downloading a single file, I would like to download the entire commit, which is over 300 files. For reference, the repo is here:

https://github.com/KingOfCramers/sidtoday

... and the commit of the new files that I would like to download (to my local computer) is here:

https://github.com/KingOfCramers/sidtoday/commit/07b7008f215ffe784068d9d2d14fb5d76875ca24


Solution

  • is there really no way to simply clone/download the files uploaded by an individual commit?

    Yes, there really is no way to simply clone or download the files updated in an individual commit.

    At lesat, there's no in-Git way. You can use Git as a tool to build whatever you like, if you control the server. If the server is GitHub, well, see the last paragraph.

    The root of this problem is that a commit does not contain only changed files, nor does it containly changes: each commit is a complete snapshot of all files. Hence, to find out what changed, you must start with two snapshots. Think of this as one of those Spot the Difference games: it does you no good to get one picture, you must get both.

    As a whole, Git is designed to deliver all the snapshots. Those are commits; commits are what is in a repository; so those are what you get. If you want a different result, Git-on-the-server has all the snapshots, and can do the comparisons, and you can use this to write your own software that does whatever it is you would like done, but you will need to control the software on the Git server. Fortunately, you can clone the entire repository onto your client, and then your client is a perfectly good server.

    Note that once you do have a clone, git fetch into that clone uses a protocol that attempts to minimize network traffic, by having the two Gits compare notes. The server then prepares a so-called thin pack that contain deltas from objects that you already have, wherever that's feasible, so that you actually get just the incremental changes! But for this to work, you must have an existing clone.

    Be aware, too, that if your server is specifically GitHub, GitHub offers a REST API (well, potentially multiple APIs: the current one is version 3), and you can use that API to compare commits and to download files. See in particular https://developer.github.com/v3/git/trees/ about obtaining trees (the snapshot within each commit is a tree). Note that there are length limits that, if exceeded, will force you to clone anyway.