Search code examples
pythongoogle-sheetsgoogle-sheets-apigspread

how to remove duplicate data in google sheets by python


when i import the data into google sheets it keeps showing the name even though it already exists, how can i automatically remove those duplicates by python code. thanks everyone. Is is enter image description here

nrows = len(sheet.col_values(1))
sheet.update_cell(nrows + 1, 1, name)
    

Solution

  • I believe your goal is as follows.

    • You want to remove the duplicated rows from the following your sample Spreadsheet. This image is from your question.

    • From your showing script, you want to achieve this using gspread for python.

    • You have already been able to get and put values to Spreadsheet using Sheets API.

    In this case, how about the following sample script?

    Sample script:

    spreadsheetId = "###" # Please set the Spreadsheet ID.
    sheetName = "Sheet1" # Please set the sheet name.
    rng = "A2:C" # Please set the range you want to check.
    
    # Retrieve values from the sheet.
    spreadsheet = client.open_by_key(spreadsheetId)  # or spreadsheet = client.open("### spreadsheet name ###")
    worksheet = spreadsheet.worksheet(sheetName)
    values = worksheet.get(rng)
    
    # Removing duplicated rows.
    res = []
    obj = {}
    for r in values:
        if r[0] not in obj:
            obj[r[0]] = r
            res.append(r)
    
    # Clear range.
    spreadsheet.values_clear("'" + sheetName + "'!" + rng)
    
    # Put the values to the sheet.
    worksheet.update("A2", res, value_input_option="USER_ENTERED")
    
    • When this script is run, by checking the column "A", the values, that the duplicated values were removed, are put to the columns "A" to "C".
    • If you want to use only the column "A", please modify rng = "A2:C" to rng = "A2:A"

    Note:

    • From your showing script, I use rng = "A2:C". If your actual range is different, please modify it.

    References: