I am attempting to use the Airtable API to retrieve records from my data there - specifically, a list of URLs I have in column cells.
I wrote a function, get_airtable_records
, to do the API call via curl
and it works - returning results as a Json object. Specifically, I am pushing the URLs to an array, $article_urls
.
The only problem is, Airtable limits the return of results to "pages" of a maximum 100 records, and my data contains more than that. The API accepts parameters maxRecords
and pageSize
but the more important one, pageSize
, is still capped at 100.
What Airtable does also return is another Json value, offset
, which is used in such cases for pagination. The offset
is a record ID intended to be used as an input parameter (also called offset
). You can use it to denote the starting record in a subsequent additional API call. I understand this.
What I don't understand is how to modify my code to account for the possibility of needing to poll Airtable again.
In other words, we should always do a starting run from scratch, when there is no offset
value.
Then, if an offset
value is present in returned results, we should go around again - until an offset
value is not present.
Here is what I have.
// Make get request, store result in array
$articles = get_airtable_records($offset); // $offset won't exist at the start
// Prepare Article URLs list as an array
if (!isset($article_urls)) {
$article_urls = array();
}
// For each URL found in Airtable
foreach($articles['records'] as $record){
$url = $record['fields']['Published URL'];
// Add to our array list
if (!empty($url)) {
array_push($article_urls, $url);
}
}
// URL list after first pass:
echo '<pre>';
print_r($article_urls);
echo '</pre>';
// May hit a max of 100
// echo 'Offset: ' . $articles['offset'];
// Value like "itrJYSLx0RfslI80f/recEu6TiPTPCSDxg5" may exist.
// If so, go back to start, do get_airtable_records($offset) again and array_push
// Until, once more there is no "offset" value at end
I am speculating that some sort of while
loop will be useful... ?
A couple of things are true...
offset
value needing to be passed, since it starts from record 0.offset
value, which should be used to make another pass.offset
value, since it will have returned the final page of exhausted results, and there is no need to start again.Thanks largely to @anthony's answer to a similar question here, I seem to have some working code...
// Prepare Article URLs list as an array
$article_urls = array();
// Call Airtable records in pages of 100 max
do {
// Offset is either inherited from last page's results, or is nothing
$offset = $articles['offset'] ?: "";
// Make get request, store result in array
$articles = get_airtable_records($offset);
// For each URL found in Airtable
foreach($articles['records'] as $record){
$url = $record['fields']['Published url'];
// Add to our array list
if (!empty($url)) {
array_push($article_urls, $url);
}
}
} while(!empty($articles['offset'])); // If there's an offset value (ie. starting record of next page), do again
// Output URL list for check
echo '<pre>';
print_r($article_urls);
echo '</pre>';
Explanation seems to be:
Use a do
while
loop.
At the start of this, set offset
to be either the value inherited from the previous run, or nothing.
My get_airtable_records
function was already limiting the presence or not of offset
in the API call, with the following, which adds the offset
query string to the URL for the next API call if one is present...
if (!empty($offset)) {
$q_offset = '&offset='.$offset;
}
I have tested this and it gave me all 137 results from two pages in to my $article_urls
array. I haven't tested it with any more than two pages of results.