I'm not getting file_get_contents() to return the page in this particular case where the url contains an 'Ö' character.
$url = "https://se.timeedit.net/web/liu/db1/schema/s/s.html?tab=3&object=CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1&type=subgroup&startdate=20150101&enddate=20300501"
print file_get_contents($url);
How do I make file_get_contents() work as expected on this url?
I have tried following solutions whithout a working result:
1.
print rawurlencode(utf8_encode($url));
2.
print mb_convert_encoding($url, 'HTML-ENTITIES', "UTF-8");
3.
$url = urlencode($url);
print file_get_contents($url);
4.
$content = file_get_contents($url);
print mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
Found in these questions:
file_get_contents - special characters in URL
PHP get url with special characters without urlencode:ing them!
file_get_contents() Breaks Up UTF-8 Characters
UPDATE: As you can see a page is actually returned in my example but it is not the expected page, the one you get when you type the url in the browser.
URLs cannot contain "Ö"! Start from this basic premise. Any characters not within a narrowly defined subset of ASCII must be URL-encoded to be represented within a URL. The right way to do that is to urlencode
or rawurlencode
(depending on which format the server expects) the individual segment of the URL, not the URL as a whole.
E.g.:
$url = sprintf('https://se.timeedit.net/web/liu/db1/schema/s/s.html?tab=3&object=%s&type=subgroup&startdate=20150101&enddate=20300501',
rawurlencode('CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1'));
You will still need to use the correct encoding for the string! Ö
in ISO-8859-1 would be URL encoded to %D6
, while in UTF-8 it would be encoded to %C3%96
. Which one is the correct one depends on what the server expects.