Search code examples
c#xmlpolygonkml

C# Convert XML file to Multiple Polygon Objects


After converting multiple MapInfo files into a unique Shapefile, and then converting that file again to .KML, I got the following .XML file. My idea is to extract each set of 'coordinates' sections, and build polygons using them.

Other attempted solution:

Given the excessive time facing this blockage, I tried obtaining each pair of 'coordinates' tags and using Substring to get the coordinates. Unfortunately given the size of the file (>400 MB) this dirty approach is not practical.

Xml file

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
  <Document id="root_doc">
    <Schema id="PruebaKML4g.schema">
      <SimpleField name="FID" type="float"/>
      <SimpleField name="REGION" type="float"/>
      <SimpleField name="NOMBRE" type="string"/>
      <SimpleField name="layer" type="string"/>
      <SimpleField name="path" type="string"/>
    </Schema>
    <Document id="PruebaKML4g">
      <name>PruebaKML4g</name>
      <Placemark id="PruebaKML4g.1">
        <ExtendedData>
          <SchemaData schemaUrl="#PruebaKML4g.schema">
            <SimpleData name="FID">5</SimpleData>
            <SimpleData name="REGION">1</SimpleData>
            <SimpleData name="NOMBRE">BAJA CALIFORNIA</SimpleData>
            <SimpleData name="layer">LBS_REGION_1_region</SimpleData>
            <SimpleData name="path">C:/Files/LBS_REGION_1_region.shp</SimpleData>
          </SchemaData>
        </ExtendedData>
        <MultiGeometry>
         <Polygon>
            <outerBoundaryIs>
              <LinearRing>
                <coordinates>
                  -105.258751,21.782028,0
                  -105.247174,21.81173,0
                  -105.241826,21.809401,0
                  -105.236994,21.806241,0
                  -105.232822,21.802344,0
                  -105.229439,21.79783,0
                  -105.228552,21.796052,0
                  -105.228974,21.795899,0
                  -105.230294,21.79522,0
                  -105.231872,21.79511,0
                  -105.234048,21.79431,0
                  -105.235131,21.794083,0
                  -105.236824,21.793857,0
                  -105.238518,21.793295,0
                  -105.239365,21.792389,0
                  -105.240327,21.790914,0
                  -105.242379,21.79046,0
                  -105.243829,21.790459,0
                  -105.245644,21.788766,0
                  -105.247331,21.785709,0
                  -105.24817,21.783115,0
                  -105.248701,21.780372,0
                  -105.258751,21.782028,0
                </coordinates>
              </LinearRing>
            </outerBoundaryIs>
          </Polygon>        
...
            <Polygon>
            <outerBoundaryIs>
              <LinearRing>
                <coordinates>
                  -103.704559,20.767933,0
                  -103.702714,20.773608,0
                  -103.701694,20.77322,0
                  -103.700762,20.772672,0
                  -103.699944,20.77198,0
                  -103.699267,20.771165,0
                  -103.698751,20.770252,0
                  -103.698411,20.769268,0
                  -103.698258,20.768243,0
                  -103.698297,20.76721,0
                  -103.704559,20.767933,0
                </coordinates>
              </LinearRing>
            </outerBoundaryIs>
          </Polygon>
          <Polygon>
            <outerBoundaryIs>
              <LinearRing>
                <coordinates>
                  -105.160778,20.766278,0
                  -105.162411,20.77201,0
                  -105.161328,20.77219,0
                  -105.160228,20.77219,0
                  -105.159145,20.77201,0
                  -105.158111,20.771656,0
                  -105.157159,20.771139,0
                  -105.156317,20.770474,0
                  -105.15561,20.769682,0
                  -105.15506,20.768786,0
                  -105.160778,20.766278,0
                </coordinates>
              </LinearRing>
            </outerBoundaryIs>
          </Polygon>          
          <Polygon>
            <outerBoundaryIs>
              <LinearRing>
                <coordinates>
                  -117.125814,32.524285,0
                  -117.125516,32.524512,0
                  -117.125142,32.524428,0
                  -117.124876,32.524169,0
                  -117.124754,32.524513,0
                  -117.124784,32.525361,0
          </coordinates>
              </LinearRing>
            </outerBoundaryIs>
          </Polygon>
        </MultiGeometry>
      </Placemark>
    </Document>
  </Document>
</kml>

I tried to use the following code:

Main

 public T DeserializeToObject<T>(string filepath) where T : class
        {
            System.Xml.Serialization.XmlSerializer xmlSerializer = new System.Xml.Serialization.XmlSerializer(typeof(T));

            using (StreamReader streamReader = new StreamReader(filepath))
            {
                return (T)xmlSerializer.Deserialize(streamReader);
            }
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            String pathKml = @"C:\PruebaKML4g.kml";

            List<Kml> elementsList = DeserializeToObject<List<Kml>>(pathKml);
        }

Kml.cs

 /*[XmlRoot(ElementName = "kml")] changed by Mike Clark suggestion*/ 
[XmlRoot(ElementName = "kml", Namespace = "http://www.opengis.net/kml/2.2")]
    public class Kml

    {
        public List<Polygon> polygons = new List<Polygon>();
    }

    public class Polygon
    {
        [XmlAttribute("outerBoundaryIs")]
        public String outerBoundaryIs { get; set; }

        [XmlAttribute("linearRing")]
        public String linearRing { get; set; }

        [XmlAttribute("coordinates")]
        public String coordinates { get; set; }
    }

However, the SimpleData elements in the XML file appear to be interfering with my parsing, generating the following error

InvalidOperationException: xmlns='http://www.opengis.net/kml/2.2'> was not expected.

Any clue about where my mistake is will be appreciated.


Solution

  • With all those conversion steps, perhaps the XML file is malformed? Or maybe a memory error with so much data? Try parsing the file with a low-memory requirement SAX parser to let it find any syntactical errors that might be buried deep in the file. Do you have Python installed?

    python -c "import xml.sax;p=xml.sax.make_parser();p.parse(r'yourfile.xml')"

    Change yourfile.xml to the correct path and filename of your XML file.

    If it prints nothing, the file is syntactically valid. If it prints an error, try to use the line:column info in the error to spot the error in your XML.


    Part 2:

    List<Kml> elementsList = DeserializeToObject<List<Kml>>(pathKml);

    might be wrong. XML documents can have one and only one root <element> (in this case, <kml>) so I think having a list of Kml instances will not make make sense for the parser. Try this instead:

    Kml root = DeserializeToObject<Kml>(pathKml);

    But that is a simple problem compared to the next problem, which is that I think your C# class structure needs to mirror exactly the hierarchical structure of the XML. The polygons are under this hierarchy:

    kml > Document > Document > Placemark > MultiGeometry

    which means you would need something like

    class Kml {
      Document Document;
    }
    class Document {
      Document Document;
      Placemark Placemark;
    }
    class Placemark {
      Polygon[] MultiGeometry;
    }
    class Polygon {
      OuterBoundaryIs outerBoundaryIs;
    }
    class OuterBoundaryIs {
      LinearRing LinearRing;
    }
    class LinearRing {
      string coordinates;
    }
    

    Then you would need something like

    var polygons = kml.Document.Document.Placemark.MultiGeometry;
    for(int i = 0; i < polygons.Length; i++) {
      var polygon = polygons[i];
      string coordinates = polygon.outerBoundaryIs.LinearRing.coordinates;
      // do something with coordinates
    }
    

    By the way, a better kind of parser for this type of thing would be an XPath parser which can avoid the need to model the XML structure with C# classes. It takes a little practice and research to craft an XPath query, but the resulting code is cleaner, and it's a good skill to have some experience with. More XPath see:

    https://stackoverflow.com/a/16012736/11611195

    https://learn.microsoft.com/en-us/dotnet/standard/data/xml/select-nodes-using-xpath-navigation