Search code examples
phpjsonweb-servicesdoi

parsing paginated json from web service


I am trying to parse a large amount of JSON data generated from a remote web service. The output produced is paginated across 500 URIs and each URI contains 100 JSON objects. I need to match a property in each JSON object, it's DOI (a digital object identifier), against a corresponding field fetched from a local database and then update the record.

The issue I am having is controlling my looping constructs to seek out the matching JSON DOI while making sure that all the data has been parsed.

As you can see I have tried to use a combination of break and continue statements but I am not able to 'move' beyond the first URI.

I later introduced a flag variable to help control the loops without effect.

 while($obj = $result->fetch_object()){       
       for($i=1;$i<=$outputs_json['meta']['response']['total-pages'];$i++){ 
         $url = 'xxxxxxxxxxxxxxx&page%5Bnumber%5D='."$i".'&page%5Bsize%5D=100';
         if($outputs = json_decode(file_get_contents($url),true)===false){
         }
         else{   
            try{
                $outputs = json_decode(file_get_contents($url),true); 
                $j=0;                   
                do{
                    $flag = false;
                    $doi = trim($outputs['data'][$j]['attributes']['identifiers']['dois'][0], '"');
                    if(!utf8_encode($obj->doi)===$doi) continue;                           
                    }else{
                        $flag = true;                   
                        $j++;
                    }
                }while($j!==101);
                  if($flag===true) break;
               } catch(Exception $e) {
            }       
        }   
    }
}

} What is the optimal approach that guarantees each JSON object at all URIs is parsed and that CRUD operations are only performed on my database when a fetched record's DOI field matches the DOI property of the incoming JSON data?


Solution

  • I'm not 100% sure I understand every aspect of your question but for me it would make sense to change the order of execution

    • fetch page from external service
      • decode json and iterate through all 100 objects
        • get one DOI
        • fetch corresponding record from database
        • change db record
      • when all json-objects are progressed - fetch next url
    • repeat until all 100 urls are fetched

    I think it's not a good idea to fetch one record from local DB and try to find it in 100 different remote calls - instead it's better to base your workflow/loops on fetched remote data and try to find the corresponding elements in your local DB

    If you think that approach will fit your task - I can of course help you with the code :)