Search code examples
google-apps-scriptgoogle-sheetsduplicates

Google Apps Script - Remove duplicates in large spreadsheets


Using the basic tutorial code from Google works fine for removing duplicates in GoogleSheets. However, since the maximum execution time is 6minutes, I'm running into timedout issues with larger spreadsheets that have thousands of rows. How would I be able to modify this code to work with larger spreadsheets? For example, how would I be able to set a maximum amount of rows to iterate, starting from the bottom?

This is the code:

function removeDuplicates() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var newData = [];
  for (var i in data) {
    var row = data[i];
    var duplicate = false;
    for (var j in newData) {
      if (row.join() == newData[j].join()) {
        duplicate = true;
      }
    }
    if (!duplicate) {
      newData.push(row);
    }
  }
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

Solution

  • I believe your goal as follows.

    • You want to delete the duplicated rows.
    • You want to achieve this using Google Apps Script.
    • You want to reduce the process cost of the script.

    In this case, I thought that removeDuplicates() might be used for achieving your goal. The sample script is as follows.

    Sample script:

    function removeDuplicates() {
      SpreadsheetApp.getActiveSheet().getDataRange().removeDuplicates();
    }
    

    Or, in this case, Sheets API can be also used as follows. When above script was not useful for your situation, please test the following script. When Sheets API is used, the process cost can be reduced a little. Ref Before you use this, please enable Sheets API at Advanced Gooele services.

    function removeDuplicates2() {
      const ss = SpreadsheetApp.getActiveSpreadsheet();
      const sheet = ss.getActiveSheet();
      const resource = { requests: [{ deleteDuplicates: { range: { sheetId: sheet.getSheetId() } } }] };
      Sheets.Spreadsheets.batchUpdate(resource, ss.getId());
    }
    

    References: