Search code examples
c#linq-to-xmlxml-sitemap

Issues querying google sitemap.xml with Linq to XML


I have a Linq-2-XML query that will not work if a google sitemap that I have created has its urlset element populated with attributes but will work fine if there are no attributes present.

Can't query:

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
  <loc>http://www.foo.com/index.htm</loc>
  <lastmod>2010-05-11</lastmod>
  <changefreq>monthly</changefreq>
  <priority>1.0</priority>
</url>
<url>
  <loc>http://www.foo.com/about.htm</loc>
  <lastmod>2010-05-11</lastmod>
  <changefreq>monthly</changefreq>
  <priority>1.0</priority>
 </url>
</urlset>

Can query:

<?xml version="1.0" encoding="utf-8"?>
<urlset>
<url>
  <loc>http://www.foo.com/index.htm</loc>
  <lastmod>2010-05-11</lastmod>
  <changefreq>monthly</changefreq>
  <priority>1.0</priority>
</url>
<url>
  <loc>http://www.foo.com/about.htm</loc>
  <lastmod>2010-05-11</lastmod>
  <changefreq>monthly</changefreq>
  <priority>1.0</priority>
 </url>
</urlset>

The query:

XDocument xDoc = XDocument.Load(@"C:\Test\sitemap.xml");
var sitemapUrls = (from l in xDoc.Descendants("url")
                           select l.Element("loc").Value);
foreach (var item in sitemapUrls)   
{       
  Console.WriteLine(item.ToString());
}

What would be the reason for this?


Solution

  • See the "xmlns=" tag in the XML? You need to specify the namespace. Test the following modification of your code:

    XDocument xDoc = XDocument.Load(@"C:\Test\sitemap.xml");
    XNamespace ns = "http://www.sitemaps.org/schemas/sitemap/0.9";
    
    var sitemapUrls = (from l in xDoc.Descendants(ns + "url")
                        select l.Element(ns + "loc").Value);
    foreach (var item in sitemapUrls)
    {
        Console.WriteLine(item.ToString());
    }