Search code examples
htmlxmlbashxmlstarlet

how to ignore attribute without quotes in xml


i want to count how many times tag1 occurs givin this 123.xml file ( streaming from the internet)

<startend>

 <tag1 name=myname>
<date>10-10-10</date>
</tag1 >

 <tag1 name=yourname>
   <date>11-10-10</date>
  </tag1 >

 </startend>

using : xmlstarlet sel -t -v "count(//tag1)" 123.xml

output :

AttValue: " or ' expected attributes construct error

how to ignore that the attribute has no " " ?


Solution

  • You input XML/HTML structure has invalid tags/attributes and should be recovered beforehand:

    xmlstarlet solution:

    xmlstarlet fo -o -R -H -D 123.xml 2>/dev/null | xmlstarlet sel -t -v "count(//tag1)" -n
    

    The output:

    2
    

    Details:

    • fo (or format) - Format XML document(s)
    • -o or --omit-decl - omit xml declaration
    • -R or --recover - try to recover what is parsable
    • -D or --dropdtd - remove the DOCTYPE of the input docs
    • -H or --html - input is HTML
    • 2>/dev/null - suppress errors/warnings