While setting the Canonical tag, i found out that i am not getting all the juice out of the canonical purpose...
GIVEN
Currently ugly urls like website.org/juice?ln=de
are made nice via apache, reachable in more userfriendly way, like website.org/de/juice
. Now, in this multi-lingual website, I wish for consistency and all pages to have their languages as a folder. I wish the search engines to remember and prefer those /language/page
as opposed to their ugly counterparts /page?ln=language
.
Question 1: Am I sofar on the right track in how i want to use Canonical to communicate this to the search engines out there?
CURRENTLY the code removes unneccessary strings sothat canonical urls are short:
when URL = http://website.org/de/juice?ln=whatever
canocal url= http://website.org/de/juice
Sofar so good, BUT, it does not rewrite the old files roaming on the net/old search engine cache memories, and thus following situations go wrong:
when URL = http://website.org/juice?ln=xyz (missing language folder)
then canonical becomes = http://website.org/juice (whereas it should be http://website.org/xyz/juice
Question 2: what should i add to my code, do to improve/ foolproof my canonical sothat it recognises situations where there is no language folder set?
<?php
$domain = $_SERVER['HTTP_HOST']; #domain like website.org
$qsIndex = strpos($extensions, '?'); # strip off of string/query part (?ln=xyz)
$pageclean = $qsIndex !== FALSE ? substr($extensions, 0, $qsIndex) : $extensions;
$canonical = "http://" . $domain . $pageclean;
?>
<html><head><link rel="canonical" href="<?=$canonical?>"></head>...
note: languages can be things like {de, nl, es, it, en, la, .... but also zh-CN, zh-TW} so whatever that comes after ln?=
well, does your page
not the URL, but the page know what language it is? if yes, just add the language information in the canonical URL, if the page does not know what language it is (and you have no way to find out) you will just have to choose a default language parameter. not perfect from a SEO point of view, but much better than having these old URLs stay/stray around.
as a parachute you can use the new
<link rel="alternate" ...>
tag to soften that effect.