Search code examples
dockerdiff

Diff and Patch an existing docker image


I have a docker image, test:1.0.0, on an external stack in a bandwidth constrained environment. Locally I've published a new version of the image, test:2.0.0, but because of the size of these images and network constraints, it would be difficult to ship the entire brand new image to replace the old one. I was wondering if there exists some kind of lightweight way to patch an existing docker image with changes between the old and new version? I was considering copying out the file systems of the images and running diff and patch on them, but it doesn't seem ideal.


Solution

  • At present, there's no way to do anything like this. The contents of a layer are not processed by registries, they are treated as a byte stream, and only the digest of that entire blob is verified for the content addressable store. If one file in a layer changes, or anything else that would change the digest, the entire layer is pushed.

    Potentially, you could do this out-of-band, with two servers that extract the image and another that recreates the image, however that will be very error prone (e.g. changing the timestamp on one file in a layer would change the digest when it's recreated, and tar and gzip are not guaranteed to be reproducible). You would also have to build the tooling to extract the image, diff it to a previous image, transmit the diff to you other server, and apply that diff to recreate the image.

    Rather than unpacking the tar+gzip layers, you could consider an alternative layer format, estargz which separately compresses and references each file within the tar. This may be easier to build tooling on top of that would quickly send a diff between two blobs in a reproducible way.

    That said, you would need to develop the tooling to implement this, and maintain a second server since the registry would not perform the patch process. I think it will be easier and cheaper to either improve your bandwidth, or find ways to optimize your image layers so most layers are unmodified between builds. Buildkit recently added a --link option that may reduce the dependencies between layers. And my own work on regctl image mod may allow you to remove or fix non-reproducible parts of builds (stripping timestamps or removing mutable files you don't need).