Search code examples
asp.netxmlxmlreader

XmlReader how to read or skip a specific child that does not always exist


I have a big XML file that I must read with XmlReader because it can not be loaded into memory. This XML is formatted in this way (is a reduced version):

<?xml version="1.0" encoding="windows-1252"?>
<Products>
    <Product>
        <Code>A14</Code>
        <Name>Name1</Name>
        <Manufacturer>
            <Name>ManufacturerName</Name>
        </Manufacturer>
        <ProdCategories>
            <ProdCategory>
                <Code>015</Code>
                <Name>ProdCategoryName</Name>
            </ProdCategory>
        </ProdCategories>
        <Barcodes> <!-- note this line -->
        </Barcodes>
     </Product>

     <Product>
        <Code>A15</Code>
        <Name>Name2</Name>
        <Manufacturer>
            <Name>ManufacturerName</Name>
        </Manufacturer>
        <ProdCategories>
            <ProdCategory>
                <Code>016</Code>
                <Name>ProdCategoryName</Name>
            </ProdCategory>
        </ProdCategories>
        <Barcodes>
            <Barcode>
                 <Code>1234567890</Code> <!-- note this line -->
            </Brcode>
        </Barcodes>
     </Product>

Note the <Barcode> <Code> elements: in the first <product> is missing.

This is the code that I use for read it and for put these data in a database:

    XmlReader reader = XmlReader.Create("Products.xml");

        reader.MoveToContent();

        do
        {
                reader.ReadToFollowing("Code");
                code = reader.ReadElementContentAsString();

                reader.ReadToFollowing("Name");
                Name = reader.ReadElementContentAsString();

                reader.ReadToFollowing("Name");
                ManufacturerName = reader.ReadElementContentAsString();

                reader.ReadToFollowing("Code");
                ProdCategoryCode = reader.ReadElementContentAsString();

                reader.ReadToFollowing("Code");
                BarcodeCode = reader.ReadElementContentAsString();

                //Here I use "code", "Name", "ManufacturerName" variables to insert into a database

        } while (reader.Read());

        reader.Close();

All XML tags are present in all products except the <Barcodes> childs (<Barcode><Code>) that is present only on some product, then I cannot jump at next "code" with last ReadToFollowing because if not present I capture the first <product><code>.

I cant control XML output and cant modify it (is third-party).

There's a way to "ReadToFollowing('<Barcodes><Barcode><Code>')" so that I can specific what should seek and if there is not found I can jump it?

Thank you for your help, excuse my bad english.


Solution

  • I would suggest to pull each Product element into a tree model, using either https://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom(v=vs.110).aspx or https://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode(v=vs.110).aspx, then you can use LINQ to XML query methods or XPath to read out the data of each Product in a safe way while maintaining a low memory footprint.