Search code examples
mavennexusartifactoryarchivamaven-indexer

What is in the packed Maven index (nexus-maven-repository-index.gz)?


Where can I find some more details on what is contained in a Maven repository's Maven Index? Where can one find more details on how this all works? I am somewhat familiar with the maven-indexer, but I still have some grey spots...

What is the difference between the unpacked and packed indexes?

Does a Maven proxy repository have just the remote's index, or does it also keep an index of what artifacts it currently has cached locally?

A thorough and sufficiently lengthy reply would be highly appreciated, as I'm researching the topic and there's, unfortunately, little documentation about it.


Solution

  • There's a good amount of info on it here, with some more nerdier details: Nexus Indexer 2.0: incremental downloading

    To get started with your questions, the nexus-maven-repository-index.gz contains all of the content for the repository. Using Central as an example, this would be EVERYTHING in central. Alongside this index, an incremental index is also generated that has all the changes since the last time the index was run. A list of these is stored in nexus-maven-repository-index.properties. These incremental indexes are there so that the full index does not need to be downloaded all the time.

    Unpacked indexes are used for searching/browse remote functionality, packed indexes are used for transfer from the remote to the proxy/tool.

    A Maven proxy repository can download the remote index if available mainly for browsing the remote's assets. This occurs in Nexus Repository 2, but not in 3. In 3, the index is downloaded and can be used by Dev tools to explore the remotes contents, the biggest difference being we don't use it to populate anything inside of Nexus Repository itself. Searches by Maven will be run against the remote index if available, and then the local index.

    Proxies in Nexus Repository Manager keep an index of their own, and will download the remote index if it exists. The local will contain the contents locally, the remote will contain the content of the remote.

    The gz file is simply a means of storing the lucene index contents for transfer, which is unpacked upon retrieval and put into the local lucene index. The contents are not actual lucene indexes, just the contents, to protect against future lucene version updates.

    A packed index is generated with you run certain tasks in Nexus Repository Manager, such a Publish Index. This runs based on whatever schedule you determine.

    For group repositories an index would be created from all the member indexes, which would include remote indexes if available, otherwise all local indexes that we know about.

    Regardless of repository type, the lucene index is what is checked when doing a search.

    Some extra blog posts about the Indexer: