Search code examples
asp.netsitecoregoogle-search

Sitecore enables accessing child node around parent


I have a sitecore multisite setup.

i'm currently struggling with the "duplicate content syndrome" were google bots indexes my sites and is able to access the content of the opposite site.

this means it finds the same content on 2 different hostNames which gives the sites a lower rating in a google search.

The reason it finds duplicate content is that i am able to access a child node on the oppsosite site than the one i'm currently browsing by typing the name in the URL.

This is my web.config setup of the sites:

<site name="website2" hostName="local.domain.dk" virtualFolder="/" >physicalFolder="/" rootPath="/sitecore/content/talk" startItem="/" database="web" domain="extranet" allowDebug="true" cacheHtml="true" htmlCacheSize="10MB" registryCacheSize="0" viewStateCacheSize="0" xslCacheSize="5MB" filteredItemsCacheSize="2MB" enablePreview="true" enableWebEdit="true" enableDebugger="true" disableClientData="false"/>

<site name="website" virtualFolder="/" physicalFolder="/" >rootPath="/sitecore/content/home" startItem="/" database="web" domain="extranet" allowDebug="true" cacheHtml="true" htmlCacheSize="10MB" registryCacheSize="0" viewStateCacheSize="0" xslCacheSize="5MB" filteredItemsCacheSize="2MB" enablePreview="true" enableWebEdit="true" enableDebugger="true" disableClientData="false"/>

Even though i set the rootpath to the root of each site, i am still able to access the child node of local.domain.dk/ydelser/integration by typing local.domain-talk/integration.

Any help would be much appreciated !


Solution

  • You need to make sure you have set the hostName and targetHostName attribute in your <site> configuration. This will ensure when you link to content between sites the link will render out the full URL including hostname.

    hostName: The host name of the incoming url. May include wildcards (ex. www.site.net, *.site.net, *.net, pda.*, print.*.net)
              It's possible to set more than one mask by using '|' symbol as a separator (ex. pda.*|print.*.net)
    targetHostName: The host name to use when generating URLs to items within this site from the context of another site.
              If the targetHostName attribute is absent, Sitecore uses the value of the hostName attribute instead.
              Used only when the value of the Rendering.SiteResolving setting is true.
    

    And make sure Rendering.SiteResolving=true

      <!--  SITE RESOLVING
            While rendering item links, some items may belong to different site. Setting this to true
            make LinkManager try to resolve target site in order to use the right host name.
            Default value: true
      -->
      <setting name="Rendering.SiteResolving" value="true" />
    

    You will always be able to access a page with the full path, so as Jens says add in canonical link tags. Once you've resolved the cross site linking and canonical links issue then the google bots should oly be following clean links.