Search code examples
xmlsearch-enginesitemap

How to include a page in sitemap.xml that requires parameters


For a given page for a catalogue, item.php, there are two parameters which are used to look up the item (t and id): e.g. item.php?t=u&id=11. There are a couple of thousand items in the catalogue. The page without parameters is merely a redirect (should not be indexed).

When considering search engine crawlers is it appropriate to list each of these items in the XML individually (in the form of parameterized URLs), or is a different approach more appropriate. These items are predominantly fixed (that is to say they may be deleted or added to occasionally but it is at most a monthly occurrence).


Solution

  • You should list the thousands of URLs with parameters in the XML sitemap:

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      <url><loc>https://www.example.com/item.php?t=u&amp;id=11</loc></url>
      <url><loc>https://www.example.com/item.php?t=u&amp;id=12</loc></url>
      <url><loc>https://www.example.com/item.php?t=u&amp;id=13</loc></url>
      ...
    </urlset>
    

    Note that the & in the URL has been escaped to &amp; in the XML.

    Search engines don't care about what files generate pages, they care about unique content. To a search engine, item.php isn't a single page, it powers thousands of pages.

    XML sitemaps are not sufficient to get all these thousands of URLs indexed and ranked well in search engines. When you create an XML sitemap, it causes search engines to come crawl all those pages, but search engines rarely choose to index URLs that are only found in a sitemap. To get the pages indexed and ranked well, you need to link to the URLs. That usually means that each of your products should link to several other related products. See The Sitemap Paradox.

    I would also recommend using canonical tags to tell search engines about your preferred URLs. With multiple parameters, order of the parameters can cause unwanted duplication. See Should query strings be included or removed from the canonical tag?