Search code examples
rubydockernpmbundlerdockerfile

Docker bundle install cache issues when updating gems


I use docker in both development and production and one thing which really bugs me is docker cache simplicity. I have ruby application which requires bundle install to install dependencies so I start with the following Dockerfile: ADD Gemfile Gemfile ADD Gemfile.lock Gemfile.lock RUN bundle install --path /root/bundle All dependencies are cached and it works great until I add a new gem. Even if gem I have added is just 0.5 MB it still takes 10-15 minutes to install all application gems from scratch. And then another 10 minutes to deploy it due to the size of dependencies folder ( about 300MB).

I have encountered exactly the same problem with node_modules and npm. I was wondering, did anyone found solution for this problem?

My research results so far:

  • Source to image - caches arbitrary files across incremental builds. Unfortunately, due to the way it works it requires to push the whole 300MB to a registry even when gems are not changed. Faster build -> slower deploy even when gems are not updated.

  • Gemfile.tip - split Gemfile into two different files and only add gems to one of them. Very specific solution to bundler and I am not convinced that it is gonna scale beyond adding 1-2 gems.

  • Harpoon - would be a good fit if not the fact that they force ditching of Dockerfile and switch to they own format. Which means extra pain for all new devs in a team as this toolset requires time to learn separately from docker.

  • Temporarily package cache. That is just an idea I had not sure is it possible. Somehow bring package manager cache ( not the dependencies folder ) to the machine before installing packages and then remove it. Based on my hack it significantly speedups package installation for both bundler and npm without bloating the machine with unnecessary cache files.


Solution

  • I cache the gems to a tar file in the application tmp directory. Then I copy the gems into a layer using the ADD command before doing the bundle install. From my Dockerfile.yml:

    WORKDIR /home/app
    
    # restore the gem cache. This only runs when
    # gemcache.tar.bz2 changes, so usually it takes
    # no time
    ADD tmp/gemcache.tar.bz2 /var/lib/gems/
    
    COPY Gemfile /home/app/Gemfile
    COPY Gemfile.lock /home/app/Gemfile.lock
    RUN gem update --system && \
    gem update bundler && \
    bundle install --jobs 4 --retry 5
    

    Be sure you are sending the gem cache to your docker machine. My gemcache is 118MB, but since I am building locally it copies fast. My .dockerignore:

    tmp
    !tmp/gemcache.tar.bz2
    

    You need to cache the gems from a built image, but initially you may not have an image. Create an empty cache like so (I have this in a rake task):

    task :clear_cache do
      sh "tar -jcf tmp/gemcache.tar.bz2 -T /dev/null"
    end
    

    After the image is built copy the gems to the gem cache. My image is tagged app. I create a docker container from the image, copy /var/lib/gems/2.2.0 into my gemcache using the docker cp command, and then delete the container. Here's my rake task:

    task :cache_gems do
      id = `docker create app`.strip
      begin
        sh "docker cp #{id}:/var/lib/gems/2.2.0/ - | bzip2 > tmp/gemcache.tar.bz2"
      ensure
        sh "docker rm -v #{id}"
      end
    end
    

    On the subsequent image build the gemcache is copied to a layer before the bundle install is called. This takes some time, but it is faster than a bundle install from scratch.

    Builds after that are even faster because the docker has cached the ADD tmp/gemcache.tar.bz2 /var/lib/gems/ layer. If there are any changes to Gemfile.lock only those changes are built.

    There is no reason to rebuild the gem cache on each Gemfile.lock change. Once there are enough differences between the cache and the Gemfile.lock that a bundle install is slow you can rebuild the gem cache. When I do want to rebuild the gem cache it is a simple rake cache_gems command.