Search code examples
jsonxmlxml-parsingxml2js

Parsing an XML file with multiple <?xml> tags using Node.js/Express/xml2js


my problem is as follows:

I'm downloading an xml file using express.js and then parsing that file. Right now it looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE item   [ ]>
<item lang="EN" >
 <country>US</country>
 <doc-number>123123123</doc-number>
 <kind>A1</kind>
 <date>20191017</date>
</item>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE item  [ ]>
<item lang="EN" >
 <country>US</country>
 <doc-number>0938409384</doc-number>
 <kind>A2</kind>
 <date>20191018</date>
</item>

I'm using the xml2js library and I'm having trouble getting the entire document. My code looks something like this

parseString(xml, function (err, result) {
 console.log(obj);
})

The XML only outputs only the first piece of xml. How can I parse this so I can get an array of <item>s?

My first idea is to loop through the doc as a string and split it based on <?xml version="1.0" encoding="UTF-8"?> and parse the data that way.

Thanks!


Solution

  • I do not think you can have more than one xml declarations for a single xml document. Additionally, a root element must always be present.

    Therefore, the xml document you have provided is 2 separate xml documents, in principle. Most parsers or APIs would probably reject it, as not well formed.

    Do you have any control over how the document is generated? If yes, you should ensure that a single xml declaration and a single root element will be present. Something similar to:

    <?xml version=“1.0” encoding=“utf-8”>
    <items>
      <item>…</item>
      <item>…</item>
    </items>
    

    If you do not have any control on the generation, you should probably split it and parse the documents separately, or concatenate them and generate a document similar to the one above.