Search code examples
rdataframegroupingunique

Add unique ID column based on values in two other columns (lat, long)


This question has been asked but I'm looking for a more complete answer/slightly modified output.

I have a dataset with Lat and Long values in separate columns and want to create a unique ID for each unique combination of Lat and Long.

I'll borrow an example dataset from an older post asking the same question but for which I need a slightly different solution (Add ID column by group).

d <- read.table(text='LAT LONG
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4170 
13.5330 -15.4170 
13.5330 -15.4170 
13.5340 -14.9350 
13.5340 -14.9350 
13.5340 -15.9170 
13.3670 -14.6190', header=TRUE)

The solution given was:

d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))

#       LAT    LONG Cluster_ID
# 1  13.533 -15.418          2
# 2  13.533 -15.418          2
# 3  13.533 -15.418          2
# 4  13.533 -15.418          2
# 5  13.533 -15.417          3
# 6  13.533 -15.417          3
# 7  13.533 -15.417          3
# 8  13.534 -14.935          4
# 9  13.534 -14.935          4
# 10 13.534 -15.917          1
# 11 13.367 -14.619          5

But how do you get the interaction command to preserve order so that the first Cluster_ID above would be 1 (full vector for last column would be 1,1,1,1,2,2,2,3,3,4,5 instead of 2,2,2,2,3,3,4,4,1,5)? It's unclear how the new factor order (converted to numeric) is determined.

I have also been trying to find equivalent way of doing this using group_by in dplyr but can't figure out how to output the tibble table as a dataframe (older solutions on SO seem to use depreciated dplyr commands).

Thanks!


Solution

  • We could use match

    transform(d, Cluster_ID = match(paste0(LAT, LONG), unique(paste0(LAT, LONG))))
    

    Or convert the 'LAT', 'LONG' to sequence and then do the interaction

    transform(d, Cluster_ID = as.integer(interaction(match(LAT, 
      unique(LAT)),  match(LONG, unique(LONG)), drop=TRUE, lex.order = FALSE)))