Search code examples
repositorybinaryfilesnexusartifactoryarchiva

Repository for storing derived information (build artifacts)


I'm looking for a "repository" to store derived information (build artifacts). We have a repository (currently Mercurial) to store our source code. When something is pushed to the source repository the code goes through a continuous integration server and we do an incremental build and as a result some dlls will be changed. This should be added to some "repository" so that everybody can use that version without needing to do the build again. I'm looking for the following features:

  • It should be easy to update the source code and get the corresponding binaries (we could probably make a script for that)
  • You should easily get all binaries at once (not only those that changed during the last incremental build.
  • Binaries that weren't changed should only be stored once in the repository.
  • When updating the source code and the binaries only the changed binaries should be transferred (and not all binaries). This is similar to what happens for source code.
  • When updating to some version, only that version should be stored locally, not the complete history.
  • We should be able to remove certain versions from the binary "repository" after a while. However if the dlls are still necessary for subsequent incremental builds, these dlls should of course not be completely removed from the "repository"

What would fit these requirements?


Solution

  • I agree with Manfred, what you are looking for is a binary repository manager. Besides the Nexus repository manager you should consider Artifactory.
    As for the feature list you asked about:

    • As you have mentioned the CI server should be responsible for identifying a change in the version control and starting a build process which creates the binaries. The CI server/build tool should also deploy the generated binaries to the repository manager, in case the build was successful. Artifactory offers a build integration feature which takes care of deploying the binaries together with the build metadata.
    • Using the build integration feature of Artifactory, you can get a list of all the binaries generated by a specific build and download them as an archive. Artifactory provides a REST API for those actions.
    • There are different approaches for storing the artifacts in a repository manager. Some tools stores a multiple copies of the same binary. Other, for example Artifactory, use a checksum based storage which keeps only one copy per binary (based on its checksum). This pays of if you keep multiple copies of the same binary in different repositories, especially if you are dealing with large binaries (war files, docker images, ISOs etc.). Another benefit are cheap copies/moves between repositories which is a common practice for promotion workflows.
    • The Artifactory build integration uses checksum based deployment which deploys only binaries which does not exist in Artifactory. For binaries which do exist and have not changed, it only created a new reference to the existing binary saving the need to send the actual bytes.
    • Artifactory provides multiple option of cleaning up binaries, including built in cleanup policies and the option to develop your own custom logic using user plugins and the Artifactory query language (AQL)

    In addition, I highly recommend to take a look at the binary repository comparison matrix.

    Disclaimer: I am working for JFrog the company behind Artifactory