Search code examples
dockergroovynexus

Setup custom cleanup for docker images in nexus repository


I've been handling docker images stored in our nexus repository by using cleanup policies. these are good for basic behavior, configured in the tasks that run daily (or hourly or w.e you want) like so:

  1. first task will be of type Admin - Cleanup repositories using their associated policies
  2. 2nd of type Docker - Delete unused manifests and images
  3. 3rd of type Admin - Compact blob store

The cleanup policy has a regex, to avoid deleting a certain image tagged in a certain way (eg: build-latest), and a last downloaded at (eg: 5 days).

Now this helps deleting images every X days but some images needed to be kept as long as no other exist, i.e if the only image that exist is build-99 do not delete it, which is something I couldn't do with only policies.

how the repo looks like for what I want to achieve:

repo example

my-repository is just a folder name that by default takes the repository name, its just to demonstrate.

so how do you manage this ?

note: information specified on what was done here can be found in different SO posts or github


Solution

  • Using a groovy script that is run automatically everyday I was able to do this. The script is set in a task of Admin - Execute script which is disabled by default in nexus newer version, which I solved following Scripting Nexus Repository Manager 3 in the FAQ Section, aswell as How to Determine the Location of the Nexus 3 Data Directory.

    The script is based on documentation, issues, and code from different places (eg: StorageTxImpl.java is where you can find methods that fetch/delete assets, components, etc). It was inspired by these aswell Using the Nexus3 API how do I get a list of artifacts in a repository, NEXUS-14837 and Nexus 3 Groovy Script development environment setup

    The script:

    The script must be run before the second task (i.e equal to the first, before or after doesn't matter). the policies were also no longer needed so they were no longer assigned to the repository.

    how it works or what it does:

    • fetch the repository
    • fetch the components of a repo
    • group them by name (eg: repository/my-repository/some-project/service-A)
    • for each service loop its components and get their assets
    • filter the assets by their last_downloaded and keep only the ones not matching the most recent 3 for eg
    • delete the components related to the assets (nexus deleteComponent(cp) internally deletes the assets and their blobs)

    note: I saw scripts can be parameterized but it was not needed in my case

    note: this can be updated to loop all repositories but I just needed one

    import org.sonatype.nexus.repository.storage.Asset
    import org.sonatype.nexus.repository.storage.Query
    import org.sonatype.nexus.repository.storage.StorageFacet
    
    import groovy.json.JsonOutput
    import groovy.json.JsonSlurper
    import org.sonatype.nexus.repository.Repository
    
    class RepositoryProcessor {
        private final log
        private final repository
        private final String repoName = 'my-repository'
        private final String[] ignoreVersions = ['build-latest']
        private final int processIfSizeGt = 3
        private final int delAllButMostRecentNImages = 2
    
        RepositoryProcessor(log, repository) {
            this.log = log
            this.repository = repository
        }
    
        void processRepository() {
            def repo = repository.repositoryManager.get(repoName)
            log.debug("found repository: {}", repo)
            // will use default of sonatype
            // https://github.com/sonatype/nexus-public/blob/master/components/nexus-repository/src/main/java/org/sonatype/nexus/repository/storage/StorageFacetImpl.java
            StorageFacet storageFacet = repo.facet(StorageFacet)
            log.debug("initiated storage facet: {}", storageFacet.toString())
            // tx of type https://github.com/sonatype/nexus-public/blob/master/components/nexus-repository/src/main/java/org/sonatype/nexus/repository/storage/StorageTxImpl.java $$EnhancerByGuice ??
            def transaction = storageFacet.txSupplier().get()
            log.debug("initiated transaction instance: {}", transaction.toString())
    
            try {
                transaction.begin()
    
                log.info("asset count {}", transaction.countAssets(Query.builder().build(), [repo]))
                log.info("components count {}", transaction.countComponents(Query.builder().build(), [repo]))
    
                // queried db is orientdb, syntax is adapted to it
                def components = transaction.findComponents(Query.builder()
                        // .where("NOT (name LIKE '%service-A%')")
                        // .and("NOT (name LIKE '%service-B%')")
                        .build(), [repo])
                // cp and cpt refers to component
                // group by name eg: repository/my-repository/some-project/service-A
                def groupedCps = components.groupBy{ it.name() }.collect()
    
                // fetch assets for each cp
                // and set them in maps to delete the old ones
                groupedCps.each{ cpEntry ->
                    // process only if its greater than the minimum amount of images per service
                    if (cpEntry.value.size > processIfSizeGt) {
                        // single component processing (i.e this would be done for each service)
                        def cpMap = [:] // map with key eq id
                        def cpAssetsMap = [:] // map of cp assets where key eq cp id
                        // process service cpts
                        cpEntry.value.each { cp ->
                            // cp id of type https://github.com/sonatype/nexus-public/blob/master/components/nexus-orient/src/main/java/org/sonatype/nexus/orient/entity/AttachedEntityId.java
                            def cpId = cp.entityMetadata.id.identity
                            // asset of type: https://github.com/sonatype/nexus-public/blob/master/components/nexus-repository/src/main/java/org/sonatype/nexus/repository/storage/Asset.java
                            def cpAssets = transaction.browseAssets(cp).collect()
                           
                            // document of type https://github.com/joansmith1/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/record/impl/ODocument.java
                            // _fields of type: https://github.com/joansmith1/orientdb/blob/master/core/src/main/java/com/orientechnologies/orient/core/record/impl/ODocumentEntry.java
                            // any field is of type ODocumentEntry.java
                            // append to map if it does not belong to the ignored versions
                            if (!(cp.entityMetadata.document._fields.version.value in ignoreVersions)) {
                                cpMap.put(cpId, cp)
                                cpAssetsMap.put(cpId, cpAssets)
                            }
                        }
                        // log info about the affected folder/service
                        log.info("cp map size: {}, versions: {}",
                                cpMap.values().size(),
                                cpMap.values().entityMetadata.document._fields.version.value)
                        // order desc by last_downloaded (default is asc)
                        log.debug("cp map assets of size: {}", cpAssetsMap.values().size())
                        def sortedFilteredList = cpAssetsMap.values()
                                .sort { it.entityMetadata.document._fields.last_downloaded?.value[0] } // extract Date element using [0]
                                .reverse(true)
                                .drop(delAllButMostRecentNImages)
                        // list of cp ids from the assets that going to be deleted
                        def sortedAssetsCps = sortedFilteredList.entityMetadata.document._fields.component?.value?.flatten()
                        log.info("cp map assets size after filtering {}", sortedFilteredList.size())
                        // this will print the cps ids to delete
                        log.debug("elements to delete : sorted assets cps list {}", sortedAssetsCps)
                        // deleting components and their assets
                        cpMap.findAll { it.key in sortedAssetsCps }
                                .each { entry ->
                                    log.info("deleting cp version {}", entry.value.entityMetadata.document._fields.version?.value)
                                    // this will call delete asset internally, and by default will delete blob
                                    transaction.deleteComponent(entry.value)
                                }
                    }
                }
                transaction.commit();
            } catch (Exception e) {
                log.warn("transaction failed {}", e.toString())
                transaction.rollback()
            } finally {
                transaction.close();
            }
        }
    }
    
    new RepositoryProcessor(log, repository).processRepository()