Search code examples
gitgit-clonegit-fetch

git fetch fails due to pack-object failure


When I add our remote repository as upstream and try to fetch it , it fails as below :

    $ git fetch upstream
    remote: Counting objects: 11901, done.
    remote: aborting due to possible repository corruption on the remote side.
    error: pack-objects died of signal 9
    error: git upload-pack: git-pack-objects died with error.
    fatal: git upload-pack: aborting due to possible repository corruption on the re
    mote side.
    fatal: protocol error: bad pack header

I understand that it fails due to having huge files in the repository( which we do have) , but why does it Not fail when I clone the same repository? Because I am able to clone the repository successfully. Shouldn't The same objects be packed at the time of a clone request?


Solution

  • To expand a bit on VonC's answer...

    First, it may help to note that signal 9 refers to SIGKILL and tends to occur because the remote in question is a Linux host and the process is being destroyed by the Linux "OOM killer" (although some non-Linux systems behave similarly).

    Next, let's talk about objects and pack-files. A git "object" is one of the four types of items that are found in a git repository: a "blob" (a file); a "tree" (a list of blobs, their modes, and their names-as-stored-in-a-directory: i.e., what will become a directory or folder on when a commit is unpacked); a "commit" (which gives the commit author, message, and top level tree among other data); and a "tag" (an annotated tag). Objects can be stored as "loose objects", with one object in a file all by itself; but these can take up a lot of disk space, so they can instead be "packed", many objects into one file with extra compression added.

    Making a pack out of a lot of loose objects, doing this compression, is (or at least can be) a cpu- and memory-intensive operation. The amount of memory required depends on the number of objects and their underlying sizes: large files take more memory. Many large files take a whole lot of memory.

    Next, as VonC noted, git clone skips the attempt to use "thin" packs (well, normally anyway). This means the server just delivers the pack-files it already has. This is a "memory-cheap" operation: the files already exist and the server need only deliver them.

    On the other hand, git fetch tries, if it can, to avoid sending a lot of data that the client already has. Using a "smart" protocol, the client and server engage in a sort of conversation, which you can think of as going something like this:

    • "I have object A, which needs B and C; do you have B and C? I also have D, E, and F."
    • "I have B but I need C, and I have D and E; please send me A, C, and F."

    Thus informed, the server extracts the "interesting" / "wanted" objects out of the original packs, and then attempts to compress them into a new (but "thin") pack. This means the server will invoke git-pack-objects.

    If the server is low on memory (with "low" being relative to the amount that git-pack-objects is going to need), it's likely to invoke the "OOM killer". Since git-pack-objects is memory-intensive, that process is a likely candidate for the "OOM killer" to kill. You then see, on your client end, a message about git-pack-objects dying from signal 9 (SIGKILL).

    (Of course it's possible the server's OOM killer kills something else entirely, such as the bug database server. :-) )