Search code examples
xmllint

Force xmllint to ignore bad default xmlns


I am trying to process a large number of xml files (maven poms) using xmllint --xpath. With some trial and error I figured out that it does not work as expected due to the bad default namespace declaration in these files, which is as follows:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

A simple command fails as follows:

$ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml )
XPath set is empty

If I get rid of the xmlns attribute, replacing the root element as follows:

<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

The previous command gives the expected output:

$ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml )
4.0.0

Changing hundreds of pom files is not an option, especially since maven itself does not complain.

Is there a way for the xmllint to process the file with the bad xmlns?

UPDATE

Thanks to Damien I was able to make some progress:

$ ( echo setns x=http://maven.apache.org/POM/4.0.0; echo 'xpath /x:project/x:modelVersion/text()'; ) | xmllint --shell pom.xml
/ > setns x=http://maven.apache.org/POM/4.0.0
/ > xpath /x:project/x:modelVersion/text()
Object is a Node Set :
Set contains 1 nodes:
1  TEXT
    content=4.0.0

But this does not quite do what I need. My follow up questions are as follows:

  1. Is there a way to print only the text? I would like the output to contain on 4.0.0 in the above example

  2. It seems the output gets truncated after about 30 characters. Is it possible to get complete output? This does not happen with xmllint --xpath


Solution

  • strip the namespace with sed

    given in pom.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <modelVersion>4.0.0</modelVersion>
    </project>
    

    this:

    cat pom.xml | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' -
    

    returns this:

    <modelVersion>4.0.0</modelVersion>
    

    if you have funky formatting (like, the xmlns attributes are on their own lines), run it through the formatter first:

    cat pom.xml | xmllint --format - | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' -