Search code examples
xmlmavenxpathansiblepom.xml

Using Ansible to modify pom.xml


I want to modify the pom.xml files on the fly to make Maven use the locally-installed JARs for a specific groupId. So, I need to:

  1. Change/set the scope of each matching dependency to system.
  2. Add the systemPath to each matching dependency, pointing it to a file, the name of which will be a function of the artifactId.

For example, the

                <dependency>
                       <groupId>myGroup</groupId>
                       <artifactId>myGroup-agent-api</artifactId>
                       <version>3.1.38.13</version>
                       <scope>provided</scope>
                </dependency>

needs to become:

                <dependency>
                       <groupId>myGroup</groupId>
                       <artifactId>myGroup-agent-api</artifactId>
                       <version>3.1.38.13</version>
                       <scope>system</scope>
                       <systemPath>${application_path}/jar/agent-api.jar</systemPath>
                </dependency>

The first part I figured out. This Ansible-task using the xml-module should be changing/setting the scope:

- name: Set scope to system for myGroup if {{ module.name }} uses any
  xml:
    path: '{{ app_path }}/{{ module.name }}/pom.xml'
    namespaces:
      x: http://maven.apache.org/POM/4.0.0
    xpath: '//x:dependency/x:groupId[text()="myGroup"]/../x:scope'
    value: system
  register: xmlfound
  when: java_files.matched
  failed_when:
    - xmlfound is failed
    - >-
      'in order to spawn nodes' not in xmlfound.msg

(The fiddling with failed_when is necessary, because lxml throws a fit over the xpath if it cannot find the matching dependencies.)

But how would I achieve the second part -- generating the systemPath based on the artifactId of each matching dependency?

Update: answering the questions in comments:

  1. Yes, I'm fairly certain, I want to use Ansible -- the change is "mechanical", there little purchase in adding a new profile as it would just increase maintenance burden on developers, who build on their own desktops -- rather than the server -- and need the JARs downloaded from a repository. What I mean by "on the fly" is Ansible checking the sources out from git, massaging the pom.xml and invoking Maven -- locally, on each server meant to run these programs.
  2. Yeah, I would use add_children, but what would the argument be? Each dependency's systemPath is different, derived from the artifactId.
  3. Thanks for the xpath. Unfortunately, it still triggers errors for POMs without a single matching dependency, so I still need the failed_when hackery...

Solution

  • Disclaimer: this is only to answer the original question. To solve the problem, use Maven profiles instead.

    Consider the following pom.xml example:

    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
        <groupId>your-group</groupId>
        <artifactId>your-artifact</artifactId>
        <version>0.0.1-SNAPSHOT</version>
        <packaging>jar</packaging>
        <dependencies>
            <dependency>
                <groupId>myGroup</groupId>
                <artifactId>myGroup-agent-api</artifactId>
                <version>3.1.38.13</version>
                <classifier>yaml</classifier>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>my-artifact</artifactId>
                <version>1.2.3</version>
            </dependency>
            <dependency>
                <groupId>myGroup</groupId>
                <artifactId>myGroup-another-api</artifactId>
                <version>3.5.79.0</version>
                <scope>provided</scope>
            </dependency>
        </dependencies>
    </project>
    

    But how would I achieve the second part -- generating the systemPath based on the artifactId of each matching dependency?

    Unfortunately, this use case seems to be too complex for the xml module. First, you have to get the list of those dependencies, and here's the trouble: the list of matches is flat. So you'll have to recreate the structure properly which is not a trivial task especially if your dependency nodes have uneven number of the nested nodes. Since I didn't operate XPath for years, maybe I use an incorrect one, but here's what I'm talking about (given a simplest pom.xml where 2 of 3 dependencies are on artifacts with "myGroup" groupId:

    # playbook.yaml
    ---
    - name: Modify the pom.xml
      hosts: localhost
      connection: local
      gather_facts: false
      tasks:
        - name: Find the dependencies on myGroup modules
          xml:
            path: 'pom.xml'
            namespaces:
              x: http://maven.apache.org/POM/4.0.0
            xpath: '//x:dependency/x:groupId[text()="myGroup"]/..//*'
            content: text
          register: xmlfound
    
        - name: Show the results
          debug:
            var: xmlfound
    
    TASK [debug] **************************************************************************
    ok: [localhost] => 
      xmlfound:
        actions:
          namespaces:
            x: http://maven.apache.org/POM/4.0.0
          state: present
          xpath: //x:dependency/x:groupId[text()="myGroup"]/..//*
        changed: false
        count: 8
        failed: false
        matches:
        - '{http://maven.apache.org/POM/4.0.0}groupId': myGroup
        - '{http://maven.apache.org/POM/4.0.0}artifactId': myGroup-agent-api
        - '{http://maven.apache.org/POM/4.0.0}version': 3.1.38.13
        - '{http://maven.apache.org/POM/4.0.0}scope': provided
        - '{http://maven.apache.org/POM/4.0.0}groupId': myGroup
        - '{http://maven.apache.org/POM/4.0.0}artifactId': myGroup-another-api
        - '{http://maven.apache.org/POM/4.0.0}version': 3.5.79.0
        - '{http://maven.apache.org/POM/4.0.0}scope': provided
        msg: 8
    

    And I'm not even talking about iterating over all the POMs in the projects, nested loops, and the fact that you need to use win_xml module if you have Windows machines.

    Instead, you can use ansible.utils.from_xml filter to load the whole file, and then find the dependencies to process using selectattr Jinja filter:

    # playbook.yaml
    - name: Modify the pom.xml
      hosts: localhost
      connection: local
      gather_facts: false
      vars:
        vendor_group_id: myGroup
      tasks:
        - name: Read the pom.xml
          set_fact:
            current_pom: "{{ lookup('file', 'pom.xml') | ansible.utils.from_xml }}"
    
        - name: Detect the dependencies to process
          set_fact:
            dependencies_to_process: >-
              {{
                current_pom.project.dependencies.dependency 
                | selectattr('groupId', 'equalto', vendor_group_id)
              }}
    

    The next step is to use add_elements. The tricky parts here are the following:

    • it adds the elements only to the last match;
    • the added elements break the formatting. The pretty_print: true helps, but it changes the indentation from 4 spaces (typical for POMs) to 2 spaces;
    • it is not idempotent - so next time you run the playbook, it will add the systemPath nested node again.

    To overcome this, you need to:

    • search for the artifactId nodes instead of groupId ones, iterating over the list built in the previous task;
    • add the systemPath nested node only if the list item does not contain it yet;
    • restore the formatting in the end using xmllint. Of course, this could (and generally should) be done using Ansible built-in modules such as replace, but in this case it would be too a complex task. The simplest solution would be to use ansible.utils.to_xml filter, but it will lose the comments if they were present.

    Here's an almost (see notes below) full, idempotent example (note I used lstrip(vendor_group_id + "-") to define the path - it can differ in your case):

    - name: Modify the pom.xml
      hosts: localhost
      connection: local
      gather_facts: false
      vars:
        vendor_group_id: myGroup
      tasks:
        - name: Read the pom.xml
          set_fact:
            current_pom: "{{ lookup('file', 'pom.xml') | ansible.utils.from_xml }}"
    
        - name: Replace the scope
          set_fact:
            dependencies_to_process: >-
              {{
                current_pom.project.dependencies.dependency
                | selectattr('groupId', 'equalto', vendor_group_id)
              }}
    
        - name: Set scope to system for myGroup if uses any
          xml:
            path: 'pom.xml'
            namespaces:
              x: http://maven.apache.org/POM/4.0.0
            xpath: '//x:dependency[x:groupId[text()="{{ vendor_group_id }}"]]/x:scope'
            value: system
    
        - name: Set systemPath for myGroup
          xml:
            path: 'pom.xml'
            namespaces:
              x: http://maven.apache.org/POM/4.0.0
            xpath: '//x:dependency[x:artifactId[text()="{{ item.artifactId }}"]]'
            pretty_print: true
            add_children:
              - systemPath: >-
                  ${application_path}/jar/{{ item.artifactId.lstrip(vendor_group_id + "-") }}.jar
          loop: '{{ dependencies_to_process }}'
          register: add_children_result
          when: item.systemPath is not defined
    
        - name: Restore the indentation
          environment:
            XMLLINT_INDENT: '    '
          command: 'xmllint pom.xml --output pom.xml --format'
          when: add_children_result is defined and add_children_result.changed
    

    NOTES:

    The fiddling with failed_when is necessary, because lxml throws a fit over the xpath if it cannot find the matching dependencies

    I wouldn't say "necessary" as you can process the list of the matching dependencies as I suggested. Moreover, you probably want to add scope if there was no one set.

    Important: I did not even take dependencyManagement into consideration for the sake of simplicity. As you might see, this solution is already overcomplicated even for a simple case.