Search code examples
phpjsondecodeerror

Problem to retrieve json content from url in php


I use PHP and would like to retrieve data from url, json decode it and pass all entries with its field data into an array.

The url is out of my control and contains data in a format that looks like json. Problem seem to be in the data provided by the url. I recognise some ' that possibly will confuse the json decoding. So to overcome I replace these first, before encoding by str_replace.

I cannot get this to work when I read from the url, the $arrayElements is NULL. However, if I access the url in a webbrowser and copy the url content to a string variable everything works just fine.

The code is like this.

    // URL of the JSON API
    $url = 'https://www.tilastopaja.com/json/swe/sweAPI.php?type=clubs';

    // Fetch the content from the URL
    $jsonData = file_get_contents($url);

    //Handle '-exceptions
    $jsonData = str_replace("I'm a runner IF", "Im a runner IF", $jsonData);
    $jsonData = str_replace("Renners Runner's IF", "Renners Runners IF", $jsonData);
    $jsonData = str_replace("Runners' Store IK", "Runners Store IK", $jsonData);

    // Decode the JSON data into a PHP array
    $arrayElements = json_decode($jsonData, true);

    // Check if decoding was successful
    if ($arrayElements === null) {
        die('Error decoding JSON data');
    }

    // Check content of array
    var_dump($arrayElements);

I have checked that the $jsonData = file_get_contents($url); reads the data and that content is put into $jsonData

Additionally I also removed the CR/LF by introducing the below, prior to json_decode

    //Handle CR/LF?
    $jsonData = str_replace("\n\r", "", $jsonData);
    $jsonData = str_replace("\n", "", $jsonData);
    $jsonData = str_replace("\r", "", $jsonData);

But - I am still not able to do the json_decode of the url content.

Any ideas on how I can overcome this and read out the data provided by the url? As I mentioned, the url is publicly exposed and not possible for me to change. Please disregard the non-optimal coding style, this is just for clarity.

Best regards,

/Niklas


Solution

  • When retrieved from that URL, the first three bytes are a Byte Order Mark (BOM) - an identifier returned at the start of the stream that indicates the content is UTF-8. The BOM is EF BB FF, which you can see if you do var_dump(dechex(ord($jsonData[0])));, var_dump(dechex(ord($jsonData[1])));, and var_dump(dechex(ord($jsonData[2])));. See the UTF-8 section of https://en.wikipedia.org/wiki/Byte_order_mark

    You can skip these bytes, then the rest of the string will decode without any need for any string substitutions:

    // URL of the JSON API
    $url = 'https://www.tilastopaja.com/json/swe/sweAPI.php?type=clubs';
    
    // Fetch the content from the URL
    $jsonData = file_get_contents($url);
    
    // Skip BOM
    $jsonData = substr($jsonData, 3);
    
    // Decode the JSON data into a PHP array
    $arrayElements = json_decode($jsonData, true);