Search code examples
phpcurlgithub-api

PHP cURL - Link Header


I've made a code search application that interacts with GitHubs API, that i want to add pagination to, pagination data is held in the header like so:

Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next", <https://api.github.com/user/repos?page=50&per_page=100>; rel="last"

My code:

    // API CONNECTION
    $url = 'https://api.github.com/search/code?q=' . $term  . '+language:' . $lang . '&per_page=' . $pp;
    $cInit = curl_init();
    curl_setopt($cInit, CURLOPT_URL, $url);
    curl_setopt($cInit, CURLOPT_RETURNTRANSFER, 1); // 1 = TRUE
    curl_setopt($cInit, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); 
    curl_setopt($cInit, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
    curl_setopt($cInit, CURLOPT_USERPWD, $user . ':' . $pwd);
    curl_setopt($cInit, CURLOPT_HTTPHEADER, array('Accept: application/vnd.github.v3.text-match+json')); // ADD THE HIGHLIGHTED CODE SECTION

    // MAKE CURL OUTPUT READABLE
    $output = curl_exec($cInit);
    $items = json_decode($output, true); 
    curl_close($cInit); // CLOSE OUR API CONNECTION

Now, i've added in curl_setopt($cInit, CURLOPT_HEADER, true);

And now, for whatever reason - when i do var_dump($items) which worked before i added CURLOPT_HEADER to my code - instead returns a NULL. Which in turn breaks the entire project.

Doing some debugging i found that var_dump($output) is still outputting data, and as expected has the header included. However, the Link Header looks like this:

Link: ; rel="next", ; rel="last" When it shouldnt. To my knowledge, it looks like the link header has actually broken my code.

I've tried various things like trying to urlencode $output before i decode it, but to no avail. So, how do i fix this?


Solution

  • Setting curl_setopt($cInit, CURLOPT_HEADER, true); (or 1 instead of true) means that instead of just getting the body back, the $output variable also includes the headers. This is why trying to json_decode() it doesn't work - with the headers at the top, it's no longer a valid JSON string.

    This SO question has more details on the various ways you can try and parse out the headers from your body, depending on the needs of your server. If you're not using proxies, redirects or anything odd, then the accepted answer from that question may work for you (adapted for your variables):

    $header_size = curl_getinfo($cInit, CURLINFO_HEADER_SIZE);
    $header = substr($output, 0, $header_size);
    $body = substr($output, $header_size);
    

    If you're concerned that because you're dealing with Github and you don't know about their infrastructure or what they might change on you (the Github search documentation does warn it may change without advance notice after all), then you may be better off using the CURLOPT_HEADERFUNCTION option, which lets you assign a callback function to parse each (every) header that comes back from the cURL request. What the value of this must be (from the documentation):

    A callback accepting two parameters. The first is the cURL resource, the second is a string with the header data to be written. The header data must be written by this callback. Return the number of bytes written.

    You can see examples of this in the same previous SO question - it can be the usual trivial cases (a named function, or a PHP callable array), or even a closure which populates a global $headers array.

    Having tested these methods, the Link header showed up correctly for me if there was more than one page of results. If there was only one page (or no results) then the Link header was omitted from the Github response entirely.

    Without knowing what you're using for $term, $lang and $pp, this might be a bit trickier. Since you're also using a $user and $pw combo for authorization, there might be some differences in using the regular API endpoints for publicly consumable data. I would check using search queries where you know that there are many pages of results on public repositories first.

    Last but not least, if you're writing an application to consume the Github API, I suggest standing the shoulders of those who have been there before. For example, KNP Labs have a Github API wrapper for PHP which is very popular (with documentation on search and pagination), or if you're using Laravel there's a wrapper by Graham Campbell.