Search code examples
web-crawlersitemapbaidu

Baidu Sitemap files Failed to Crawl


I have submitted Sitemap files for my web-site to Chinese BAIDU.

My Sitemap consists of:

1) 16 sitemap files compressed with gzip, each file less 10Mb and contains less 50K url.
2) Sitemap Index with links to the above sitemap gzipped files.

What I see is very strange behavior of Baidu - it marks some of my sitemap files as Failed Crawl (抓取失败), while the others are seems processed (正常 - normal).
When I re-submitting same set of Sitemaps, the other files randomly become Failed Crawl, while the previously failed could be processed without fail.

Except Baidu message "Failed Crawl" (抓取失败) I not able to find what is wrong with my sitemap.
I'm getting crazy because Baidu marks files "Failed Crawl" quite randomly.

Who can suggest what is wrong?
Or where can see exact error message from Baidu why "Failed Crawl"?

At the end, after may tries, Baidu marks the Sitemap Index file as "Failed Crawl" as well (however after submission it is always first in 'waiting' status and then 'notmal' 正常 crawled)

Note: same files processed by Google, Yahoo/Bing, Yandex sucessfully!

Screenshot please see here: https://drive.google.com/open?id=0BzDlz6j9c35WWkdwb3F6LW9zazA


Solution

  • This error is most common in Bing/Baidu/Yendex. This is due to the HTTPS Request. Try Force HTTP for sitemap and submit it again.