Search code examples
arraysgoogle-apps-scriptgoogle-sheets

Import of a Large Quantity of API Data


I have been working on importing a large amount of API data into a Google Sheet to build up a database. I have a list of over 25,000 links in cells inside of a sheet that I am looking to import into the data. I have a working set of code in Apps Script that has been able to quickly and efficiently import ten links, but when I have tried the full set, I have gotten errors like:

Exception: Service invoked too many times in a short time: urlfetch. Try Utilities.sleep(1000) between calls."

This will show up eleven seconds after running the code. I have also tried adding Utilities.sleep(1000) to the code in many places to no avail. Here is the code that I am working with:

function dataImport() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheetByName("Import1");
  var countRow = 1;
  var exportSheet = ss.getSheetByName("Data1");
  var reqs = sheet.getRange("A2:A" + sheet.getLastRow()).getDisplayValues().reduce((ar, [url]) => {
    if (url) {
      ar.push({ url, muteHttpExceptions: true });
    }
    return ar;
  }, []);

  var responses = UrlFetchApp.fetchAll(reqs);
  Utilities.sleep(1000)
  var temp = responses.reduce((ar, r) => {
    if (r.getResponseCode() == 200) {
      var { id, firstName, lastName, fullName, displayName, shortName, weight, height, position: { abbreviation }, team: { $ref }, birthPlace: { city }, birthPlace: { state, country }, experience: { years, displayValue }, jersey, active } = JSON.parse(r.getContentText());
      ar.push([id, firstName, lastName, fullName, displayName, shortName, weight, height, abbreviation, $ref, city, state, country, years, displayValue, jersey, active]);
    }
    return ar;
  }, []);
  var res = [['IDs', 'First Name', 'Last Name', 'Full Name', 'Display Name', 'Short Name', 'Weight', 'Height', 'Position', 'Team URL', 'City', 'State', 'Country', 'Years', 'Class', 'jersey', 'Active'], ...temp];
  exportSheet.getRange(countRow, 1, res.length, res[0].length).setValues(res);
}

I had previously posted a question about this database as I was running into a different issue (Apps Script API Import via Links in Google Sheet Cells but Map Properties Undefined), then when the quantity became an issue, it was suggested that I post a new question to help others who may be running into this issue. Once again, this code has been able to work with a limited dataset; it just seems like the very large quantity is causing many issues.

Like I mentioned in the comments of the previous question, I'll mention it here. When I tried adding the Utilities.sleep(1000) command, most of the time I ran into the same error never the less, except in one location, to which I have not been able to replicate, so my apologies, I got this error:

Exception: Service Spreadsheets failed while accessing document with id 18tRBBcVwChuSJ0G5_A09I7EoQnji6nmRvOWcX5ndALE

Upon researching the original error, I found a few questions relating to it, but none of them worked for what I was looking to do:

How do I use Google app script UrlFetchApp.fetchAll on >1000 requests without getting "Service invoked too many times in a short time" error?: This one featured a looping set of code, but it was being run as a Sheets Extension, and doesn't match how my code is importing the data. This question, while it got comments, didn't receive an answer that worked for the person asking. They also didn't try the command the error code suggested.

How do I fetch a column of URLs in Google Apps Script/Javascript?: This deals with a far smaller base dataset, more similar to the test run in my previous question, where it was suggested to use a FetchAll command, which is currently how my code is setup and isn't working with the larger set of data.

At the end of this, I'll have the link to a copy sheet with all of the link data in case that is helpful for anyone. I also have put the code I have been using in the AppScript of that sheet. If I could get help figuring out the best way of doing this, it would be greatly appreciated!

https://docs.google.com/spreadsheets/d/1Nmjw7wCv5FzBFIE8QfqHOYGPioIkv79InBsJzpmEnyw/edit#gid=299593293

EDIT: I ended up running into more issues with the "batching" answer I ended up using here. While it is not the same issue listed, hence the new post, thought it might be helpful to link it here in case someone else runs into similar issues!

New Question: Try/Catch function inconsistency when "batching" using ScriptProperties


Solution

  • SUGGESTION

    While this approach might not be the most efficient, you can attempt to handle the exception using the batching technique in the URLFetch call.

    From my testing, it seems that the UrlFetchApp.fetchAll() method can only handle a maximum of 200 URLs in a single operation. If the number exceeds 200, an exception is triggered based on my experience

    Note: This behaviour is tested on a consumer Google account. Paid Google Workspace domain account does not produce the urlfetch exception.

    The sample tweaked script below will batch total URLs requests by 200 (E.g if you have 27,000 requests, there will be 135 batches, each batch contains 200 requests.). Basically, this involves handling potential exceptions during URL fetching and keeping track of the progress of data processing using script properties. This information can serve as your starting point to formulate the batching strategy.

    Tweaked Script

    var scriptProperties = PropertiesService.getScriptProperties();
    
    function dataImport() {
      var ss = SpreadsheetApp.getActiveSpreadsheet();
      var sheet = ss.getSheetByName("Import1");
      var exportSheet = ss.getSheetByName("Data1");
      var reqs = sheet.getRange("A2:A" + sheet.getLastRow()).getDisplayValues().reduce((ar, [url]) => {
        if (url) {
          ar.push({ url, muteHttpExceptions: true });
        }
        return ar;
      }, []);
    
      //Storage of current data
      var bucket = [];
      var batchSize = 200;
      var batches = batchArray(reqs, batchSize);
      var startingBatch = scriptProperties.getProperty("batchNumber") == null ? 0 : parseInt(scriptProperties.getProperty("batchNumber"));
      var processedBatches = scriptProperties.getProperty("processedBatches") == null ? 0 : parseInt(scriptProperties.getProperty("processedBatches"));
    
      console.log(`Total: ${reqs.length}.\n${batches.length} batches.`)
    
      if (processedBatches >= (batches.length - 1)) {
        console.log('All data has been processed already.');
      } else {
        //Start from the very last batch that stopped that needs to be processed.
        for (let i = startingBatch; i < batches.length; i++) {
          console.log(`Processing batch index #${parseInt(i)}`);
          try {
            var responses = UrlFetchApp.fetchAll(batches[i]);
            bucket.push(responses);
            //Remove previous batch index number
            scriptProperties.deleteProperty("processedBatches");
            //Store latest sucessful batch index number
            scriptProperties.setProperty("processedBatches", parseInt(i));
          }
          //Catch the last batch index number where it stopped due to URL fetch exception
          catch (e) {
            //Remove the old batch number to be replaced with new batch number.
            scriptProperties.deleteProperty("batchNumber");
            //Remember the last batch that encountered and error to be processed again in the next call.
            scriptProperties.setProperty("batchNumber", parseInt(i));
            console.log(`Batch index #${parseInt(i)} stopped`);
            break;
          }
        }
    
        const initialRes = [].concat.apply([], bucket);
    
        var temp = initialRes.reduce((ar, r) => {
          if (r.getResponseCode() == 200) {
            var { id, firstName, lastName, fullName, displayName, shortName, weight, height, position: { abbreviation }, team: { $ref }, birthPlace: { city }, birthPlace: { state, country }, experience: { years, displayValue }, jersey, active } = JSON.parse(r.getContentText());
            ar.push([id, firstName, lastName, fullName, displayName, shortName, weight, height, abbreviation, $ref, city, state, country, years, displayValue, jersey, active]);
          }
          return ar;
        }, []);
        var res = [...temp];
    
        //Add table headers
        exportSheet.getLastRow() == 0 && exportSheet.appendRow(['IDs', 'First Name', 'Last Name', 'Full Name', 'Display Name', 'Short Name', 'Weight', 'Height', 'Position', 'Team URL', 'City', 'State', 'Country', 'Years', 'Class', 'jersey', 'Active']);
    
        //Add table data
        var result = () => {
          return temp.length != 0 && exportSheet.getRange(exportSheet.getLastRow() + 1, 1, res.length, res[0].length).setValues(res);
        }
        result() && console.log(`Processed: ${res.length}.`);
      }
    }
    
    //Function to chunk the request data based on batch sizes
    function batchArray(arr, batchSize) {
      var batches = [];
    
      for (var i = 0; i < arr.length; i += batchSize) {
        batches.push(arr.slice(i, i + batchSize));
      }
    
      return batches;
    }
    
    //Optional function to check the latest processed batch
    function checkBatch() {
     console.log(`${parseInt(scriptProperties.getProperty("processedBatches"))+1} batches have been processed.`);
    }
    

    Here's a sample step flow on how the script works:

    1. Run the dataImport function.

    2. Process each batch (either starts from the batchNumber index number 0 Or the last tracked index number) in the URLFetch.fetachAll() method.

    3. If a batch does not invoke an exception, it will be appended to the Data1 sheet.

    4. If a batch invokes an exception, the batchNumber property will be updated with that batch's index number & stops the script.

    5. Repeat Steps 1 and 2.

    6. Repeat Step 3 or Step 4.

    You can either place the dataImport function a time-based trigger (to continue running in the background until it finishes all remaining batches) Or manually run it from the Apps Script Editor.

    Quick Demo

    • E.g. you have 400 rows of data to be batched by 200 (2 batches). The first function run finishes batch index 0 but encountered an exception on batch index 1. The next function run finally finishes the batch index 1.

    image

    References