Search code examples
google-apps-scriptgoogle-sheets

Remove duplicates based on multi column and keep latest entry in google sheets using Apps Script


I have a spreadsheet like this structure

table

I want to remove remove duplicates based on multi column and keep latest entry, so result I want like this

table_result

I have tried with this function, but latest entry removed.

function removeDuplicates() {
  function removeDuplicates() {
  var checkHeadeTitles = ["School Name", "Class Name", "Group Name"]; // Duplicate based this header.

  var ss = SpreadsheetApp.getActive();
  var sh = ss.getSheetByName("Sheet1");
  var header = sh.getRange(1, 1, 1, sh.getLastColumn()).getValues()[0];
  var columnNumber = checkHeadeTitles.map(e => {
    var idx = header.indexOf(e);
    if (idx == -1) {
      throw new Error(`No header tilte of '${e}'.`);
    }
    return idx + 1;
  });
  sh.getRange(2, 1, sh.getLastRow() - 1, sh.getLastColumn()).removeDuplicates(columnNumber);

  var row = sh.getDataRange().getValues().findIndex(r => r.join("") == "");
  if (row == -1) return;
  sh.deleteRow(row + 1);
}
}```

Please help me.
Thank you

Solution

  • Modification points:

    • About I have tried with this function, but latest entry removed., in the current stage, it seems that this situation is the current specification of removeDuplicates. When removeDuplicates is used, it seems that the duplicate rows are removed while the top row is kept.

    In order to achieve your goal, how about the following modifications?

    Modified script 1:

    In this modification, the values that removed the duplicate rows and the empty rows are created and the created values are put into the cells.

    function removeDuplicates1() {
      var checkHeadeTitles = ["School Name", "Class Name", "Group Name"]; // Duplicate based this header.
      var ss = SpreadsheetApp.getActive();
      var sh = ss.getSheetByName("Sheet1");
      var range = sh.getDataRange();
      var [header, ...values] = range.getValues();
      var index = checkHeadeTitles.map(e => {
        var idx = header.indexOf(e);
        if (idx == -1) {
          throw new Error(`No header tilte of '${e}'.`);
        }
        return idx;
      });
      var v = [...values.reverse().reduce((m, r) => {
        var k = index.map(e => r[e]).join("");
        if (k && !m.has(k)) {
          m.set(k, r);
        }
        return m;
      }, new Map()).values()].reverse();
      range.offset(1, 0).clearContent().offset(0, 0, v.length, v[0].length).setValues(v);
    }
    
    • In this case, the process cost is low. But, when you want to keep the text styles of each row, this modification cannot be used.

    Modified script 2:

    In this modification, the removed the duplicate rows are removed.

    function removeDuplicates2() {
      var checkHeadeTitles = ["School Name", "Class Name", "Group Name"]; // Duplicate based this header.
      var ss = SpreadsheetApp.getActive();
      var sh = ss.getSheetByName("Sheet1");
      var [header, ...values] = sh.getDataRange().getValues();
      var index = checkHeadeTitles.map(e => {
        var idx = header.indexOf(e);
        if (idx == -1) {
          throw new Error(`No header tilte of '${e}'.`);
        }
        return idx;
      });
      var { rows } = values.reverse().reduce((o, r, i, a) => {
        var k = index.map(e => r[e]).join("");
        if (k && !o.s.has(k)) {
          o.s.add(k);
        } else {
          o.rows.push(a.length - i + 1);
        }
        return o;
      }, { s: new Set(), rows: [] });
      if (rows.length == 0) return;
      rows.forEach(e => sh.deleteRow(e));
    }
    
    • In this case, the process cost is high. But, when you want to keep the text styles of rows, this modification can be used.

    • When you want to reduce the process cost, I think that Sheets API can be used for this situation. In that case, please modify this script as follows. And, please enable Sheets API at Advanced Google services.

      • From

        rows.forEach(e => sh.deleteRow(e));
        
      • To

        var sheetId = sh.getSheetId();
        var requests = rows.map(e => ({ deleteDimension: { range: { sheetId, startIndex: e - 1, endIndex: e, dimension: "ROWS" } } }));
        Sheets.Spreadsheets.batchUpdate({ requests }, ss.getId());
        

    References: