Search code examples
google-apps-scriptgoogle-sheetsgoogle-sheets-formulaformula

Removing duplicate/opposite entries in Google Sheets


I have a sheet containing the following data: URL A, URL B, similar score in percent.

If URL A is 98% similar to URL B, it means that URL B is 98% similar to URL A, and listed as well.

I want to find and eliminate these duplicates/reversed entries. For now, I have tried having two extra columns concatenating URL A+URL B in one, and URL B+URL A in one. This way I have unique identifiers.

After this I'm kinda stuck, because I'm dealing with a lot of variables, as data is in two different rows, and two different columns. I might be looking into a script, taking the A+B value, iterating through the B+A value until it finds a match, and somehow marks this (or simply just deletes it), since my knowledge of formulas for highlighting these duplicates are falling short.

This sheet shows the concept - the first 100 rows (it's about 11K in total): https://docs.google.com/spreadsheets/d/1YKsguAn1lYjV4FlP_6_TlKGvFcpFAEzn7bpAyOEmozQ/edit?usp=sharing

Any suggestions for what I should look into?


Solution

  • I ended up with a solution where I sorted by URL A and implemented this formula:

    =IF(A2<B2,A2&B2,B2&A2)
    

    This way I had the concatenation the same way for the real one and the opposite. I didn't know you could use "<" on strings.

    After this, I could delete duplicated values in the column with the formula above.