This is what my website's sitemap.xml looks like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.com/</loc>
<lastmod>2013-04-02T12:45:31+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>1</priority>
</url>
<url>
<loc>http://example.com/2013/wordpress-customize-login-page/</loc>
<lastmod>2013-03-01T12:06:00+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
And here's the original sitemap. First, I made sure of valid XML markup, then checked my sitemap on xmlcheck and sitemapxml.
The two sitemap validators gave this error:
Fatal Error 4: Start tag expected, '<' not found in http://example.com/sitemap.xml on line 1 column 1
As I see it, nothing's missing. Not sure what I am doing wrong. (Googling didn't help either.)
UPDATE: As stated in the comments, the sitemap validators in question are having trouble parsing gzipped sitemaps (in OP's case Amazon S3 only serves gzipped text responses).
I'm now in the camp that thinks this is a server issue, but I have some data to back that up (so I didn't edit the other answer). Here is what I did (my original point about being "more valid" is still below): I copied your file (viewing the source in the browser) and created a sitemap.xml that I uploaded to my S3 Bucket (and confirmed that all validators mentioned in this question consider it valid). Then I used WGET
to fetch your sitemap and my copied sitemap and this is what I found (obscuring my bucket name with [myexamples3bucket.example]
but you can see that it's an AWS IP address):
:~# wget http://[myexamples3bucket.example]/original.xml
--2013-04-02 13:26:42-- http://static.gnld.com/original.xml
Resolving [myexamples3bucket.example]... 207.171.189.80
Connecting to [myexamples3bucket.example]|207.171.189.80|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4578 (4.5K) [text/xml]
Saving to: `original.xml'
100%[======================================>] 4,578 --.-K/s in 0.002s
2013-04-02 13:26:42 (1.97 MB/s) - `original.xml' saved [4578/4578]
Then I tried to fetch your sitemap:
:~# wget http://aahank.com/sitemap.xml
--2013-04-02 13:26:55-- http://aahank.com/sitemap.xml
Resolving aahank.com... 178.236.4.60
Connecting to aahank.com|178.236.4.60|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 766 [application/xml]
Saving to: `sitemap.xml'
100%[======================================>] 766 --.-K/s in 0s
2013-04-02 13:26:55 (144 MB/s) - `sitemap.xml' saved [766/766]
The contents of these two files are very different. While the "copied" sitemap looks exactly what you would expect, your original sitemap looks like this:
^_�^H^@^@^@^@^@^@^CÍM�Ú0^P����^_^P×j��^O>,����=�J�ï¿ï¿½^Rq��1�^XY�Lnw���^R�^V�l
�jO$+U���:z�s�i�2V�Ë���u�]��Þ8_;����EcÑ9È[�M����^BwJjhw��-�4^Z^\ZJ��0I^O�0^Q�!���9��^^^]�1;^N�^]����Ǫ^Z̪^_��˪ڪB$Aɪ^M�^DmHcT-
�Ns,ªAÚª^Z�a�T�XÄV5��^[^^����A�F9^KTpÆÖe�AÔ���2È^_�$
This points to AmazonS3 being the culprit. I'm offering this up in case anyone else can figure out how to fix this. Good luck!
As for being more valid, using the official definition of a valid sitemap, I've made the following (small) changes to your site map, uploaded it to my S3 Bucket and tested it against the two sites you've linked to, and it now passes:
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
Everything else is unchanged. The error messages on those two sites are very unhelpful, but the important thing I added was the xmlns:xsi
and xsi:schemaLocation
which should inform a validator of the intended format. I would think that these are assumed by crawlers, but in the case of the two linked services, the absence of these attributes technically make the document invalid.