Search code examples
bashcommand-line-interfacepom.xmlxmlstarlet

How can I get XMLStarlet to extract multiple developer roles when parsing pom.xml?


I am parsing hundreds of poms with a bash script in a weird hierarchy to extract an overview of all the projects into a single report (the kind of thing that maven-info-projects:project-team can't do in one go). For undisclosed reasons I don't want to mess with a parent pom or try and configure maven-info-projects sections.

I am using XMLStarlet because it is installed, and xmllint is not.

Given a pom.xml extract that contains:

<developer>
   <id>devId</id>
   <name>Developer Name</name>
   <email>dev@nowhere.com</email>
   <roles>
      <role>Project manager</role>
      <role>Developer</role>
   </roles>
</developer>

How can I extract all developer info, including the multiple roles, with a single call to XMLStarlet?

At the moment, I can extract the bulk of my information with:

# Developers
locate_section_values $pom_file_name "/x:project/x:developers/x:developer" \
    "concat( \
        x:id, '|', x:name, '|', x:email, '|', x:roles, '|', \
        x:organization, '|', x:organizationUrl, '|', x:timezone
     )"

where

function locate_section_values(){
  local xml_file=$1
  local section=$2
  local value_table=$3

  OLD_IFS=$IFS
  IFS=$'\n'
  xml_values=()
  xml_values=(`xmlstarlet sel -B -N x="http://maven.apache.org/POM/4.0.0" -t -m "$section" -v "$value_table" -nl $xml_file`)
  IFS=$OLD_IFS
}

I then split the results:

  for developer in ${xml_values[@]}; do
    IFS='|' 
    set $developer # split into $1, $2, etc using | as seperator
    #echo "id:${1}, name:${2}, roles:${4}"

    if [ -n "${1}" ]; then # id
      developer_id=${1}
      developer_ids+=( $developer_id )
    fi
    ...

The problem is, a developer with multiple roles gets their roles concatenated:

 Project managerDeveloper

Is there a way to tell the original call to xmlstarlet to combine multiple roles into, for example, a comma-seperated list?


Solution

  • I think the following gives approximately what you want, but you'll have to change the interface to locate_section_values:

    xmlstarlet sel -T -B -N x="http://maven.apache.org/POM/4.0.0" \
       -t -m "/x:project/x:developers/x:developer" -v "x:id" -o "|" \
       -v "x:name" -o "|" -v "x:email" -o "|" \
       -m "x:roles/x:role" -v "." -o "," -b -o "|" \
       -v "x:organization" -o "|" -v "x:organizationUrl" -o "|" \
       -v "x:timezone" --nl 
      $pom_file_name
    

    That produces roles as a comma terminated list because it's easier to code.


    locate_section_values sans eval:

    function locate_section_values() {
        local xml_file=$1 # $local_project_dir/$fixed_name/pom.xml
        local section=$2 #/x:project/x:modules/x:module
        local value_table=("${@:3}")
    
        OLD_IFS=$IFS
        IFS=$'\n'
        xml_values=($(xmlstarlet sel -B -N x=http://maven.apache.org/POM/4.0.0 \
            -t -m "$section" "${value_table[@]}" --nl "$xml_file"))
        IFS=$OLD_IFS
    }
    

    call:

    locate_section_values "$pom_file_name" '/x:project/x:developers/x:developer' \
          -v 'x:id' -o '|' -v 'x:name' -o '|' -v 'x:email' -o '|' \
          -m 'x:roles/x:role' -v '.' -o ', ' -b -o '|' \
          -v 'x:organization' -o '|' -v 'x:organizationUrl' -o '|' \
          -v 'x:timezone'
    

    loop over developers and extract fields:

    for developer in "${xml_values[@]}"; do
        # get | separated fields
        IFS='|' read id name email roles org orgUrl timezone <<<"$developer"
    
        if [ -n "$roles" ]; then # roles
            developer_roles_csv=${roles%, } # strip trailing comma
        fi
    
        echo "$name ($id) has roles: $developer_roles_csv."
    
    done # developer