Search code examples
command-line-interfacepom.xmlxmlstarlet

XMLStarlet does not select anything


I have a typical pom.xml, and want to print the groupId, artifactId and version, separated by colon. I think that XMLStarlet is the right tool for that. I tried several ways, but I always get an empty line.

xml sel -t -m project -v groupId -o : -v artifactId -o : -v version pom.xml

Expected output:

org.something.apps:app-acct:5.4

Real output: empty line

Even if I try to print just the groupId I get nothing:

xml sel -t -v project/groupId pom.xml

I am sure that the tool sees the elements because I can list them without problem:

xml el pom.xml

prints the following (correctly):

project
project/modelVersion
project/parent
project/parent/groupId
project/parent/artifactId
project/parent/version
project/groupId
project/artifactId
project/version
project/packaging

What's wrong?

Here is the cut-down version of pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                        http://maven.apache.org/maven-v4_0_0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.something</groupId>
        <artifactId>base</artifactId>
        <version>1.16</version>
    </parent>

    <groupId>org.something.apps</groupId>
    <artifactId>app-acct</artifactId>
    <version>5.4</version>
    <packaging>war</packaging>

</project>

Solution

  • UPDATE: since version 1.5 the default namespace prefix '_' is available so the solution is reduced to this:

    xml sel -t -m _:project -v _:groupId -o : -v _:artifactId -o : -v _:version pom.xml

    Thanks @JamieNelson for the heads-up.


    Unfortunately, XMLStarlet is very picky about the default namespace. If the document has it declared (xmlns=), you have to declare it for XMLStarlet too, and prefix the elements with the name you have chosen (see here):

    xml sel -N my=http://maven.apache.org/POM/4.0.0 -t -m my:project -v my:groupId -o : -v my:artifactId -o : -v my:version pom.xml

    Running the above command gives the expected output:

    org.something.apps:app-acct:5.4
    

    However, if the document does NOT have the default namespace declared (or the namespace has a slightly different URL), the above command will NOT work, which is a real PITA. A more universal solution is to remove the default namespace declaration before selecting the elements. As of XMLStarlet 1.3.1, converting the XML to PYX format and back removes the namespace declarations:

    xml pyx pom.xml | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version 2>nul

    UPDATE (2014-02-12): as of XMLStarlet 1.4.2 the PYX <-> XML conversion is fixed (does not remove namespace declarations), so the above command will NOT work (thanks for Peter Gluck for the tip). Use the following command instead:

    xml pyx pom.xml | grep -v ^A | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version

    Note: the grep above removes ALL attributes from the document, not just namespace declarations. For this specific case (selecting element values from pom.xml where elements with non-default namespaces are not expected) it is Ok, but for a general XML you would remove just the default namespace declaration(s) and nothing else:

    xml pyx pom.xml | grep -v "^Axmlns " | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version


    Note (obsolete): the error redirection (2>nul) is necessary to hide the complaint about the (now) unknown namespace xsi:

    -:1.28: Namespace prefix xsi for schemaLocation on project is not defined

    Another way of getting rid of the complaint is to remove the schemaLocation attribute (actually, this command removes all attributes from the PYX document, not just xsi:schemaLocation):

    xml pyx pom.xml | grep -v ^A | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version