Search code examples
openrefine

Select multiple repeated records OpenRefine


The table Locations has the following items :

enter image description here

The problem is that there are some rows which are "semi-repeated" (all the elements are equals except for the attribute attb that's an integer). I want to delete all repeated rows and append all the attributes attb, separated by commas.

For example if I have these two rows :

Attb | City | County | Latitude | Longitude
--------------------------------------------
 1   |  c1  |  co1   |    l1    |  long1
--------------------------------------------    
 2   |   c1 |  co1   |    l1    |  long1

Using OpenRefine, I want to delete the second row and append the attb value of the second row to the first row attb'value. So the output expected should be :

Attb | City | County | Latitude | Longitude
--------------------------------------------
 1,2 |  c1  |  co1   |   l1     |   long1

Let me know if I was clear enough.

Thanks in advance.


Solution

  • Create a key for each row which combines the values that will be replicated. You can do this using:

    Edit Column->Add column based on this column

    Then use a GREL expression something like:

    cells["City"].value + cells["County"].value + cells["Latitude"].value + cells ["Longitude"].value
    

    Call the column 'Key' or similar.

    Note that the key doesn't include the content of the Attb column. Move the new 'Key' column to be the first column in the project Sort by the 'Key' column and apply the Sort permanently Use 'Blank down' on the Key column

    Make sure OR is in 'Record' mode (towards top right of grid). You should see that those duplicate rows are now part of the same OR record.

    On the Attb column use Edit Cells->Join Multi-valued cells and join them with a comma

    Remove the 'Key' column and switch OpenRefine back into 'Row' mode. Now use 'Facet by blank' on the Attb column, find those rows with a blank Attb and remove those rows.

    You should now have the merged rows as you want.