Search code examples
xmllinuxxmllint

Invalid formating for empty XML tags using XMLint


I have a XML unformated document, like this one :

<foo>
<bar>
<hop>
<hey>
</hey>
</hop>
</bar>
</foo>

As you can see, the tag "hey" is empty. I remember that in this case, it should be written like <hey/>, but that's not a thing I can change.

To format this document, I use the xmllint --format command. But instead of outputing

<foo>
  <bar>
    <hop>
      <hey>
      </hey>
    </hop>
  </bar>
</foo>

or

<foo>
  <bar>
    <hop>
      <hey></hey>
    </hop>
  </bar>
</foo>

it outputs

<foo>
  <bar>
    <hop>
      <hey>
</hey>
    </hop>
  </bar>
</foo>

which is not what I want. I tried to write a sed command in order to indent these particular tags after xmllint work, but I couldn't prevent sed from loading the whole (huge) xml file, and it took several minutes, like more than xmllint.

The solution would be to find an option to ask xmllint to format these tag correctly, but I could not find it in the man. Do you know something that may help me please ?


Solution

  • The hey element isn't empty, it contains a single text node whose value is a newline character. Processes that reformat the XML will typically respect that, and leave the value of the element unchanged.