Search code examples
rgroup-by

Assigning values to data based in grouping rows first


I have a dataframe to which i am trying to assign IDs to a new column (1 to however many needed). To do this i want to group by all the rows with the same ID, data and example below. So here i wish to group everytime I see an id and then rank the different locaitions we see in that id to create a location id. I have added an example of what im looking for underneath this dataset.

id     city        location   locationid
20     london      central     
20     london      north
20     london      south
25     birmingham  north
25     birmingham  south
25     birmingham  east
30     manchester  greater
30     manchester  north
30     manchester  east
30     manchester  west
33     liverpool   central
33     liverpool   east

What im looking for

id     city        location   locationid
20     london      central     1
20     london      north       2
20     london      south       4
25     birmingham  north       2
25     birmingham  south       4
25     birmingham  east        3
30     manchester  greater     1
30     manchester  north       2
30     manchester  east        3
30     manchester  west        5
33     liverpool   central     1
33     liverpool   east        2

So here anything that is central/main/greater will be a 1, then following around the compass NESW WILL BE 2-5 (not overly fussed about the location ids at this stage)


Solution

  • as.integer(
      factor(ifelse(quux$location %in% c("main", "greater"), "central", quux$location),
             levels = c("central", "north", "east", "south", "west"))
    )
    #  [1] 1 2 4 2 4 3 1 2 3 5 1 3
    

    You list 2 for row 12 (liverpool, east), though you specifically said east should be 3, so I think this is correct.


    Data

    quux <- structure(list(id = c(20L, 20L, 20L, 25L, 25L, 25L, 30L, 30L, 30L, 30L, 33L, 33L), city = c("london", "london", "london", "birmingham", "birmingham", "birmingham", "manchester", "manchester", "manchester", "manchester", "liverpool", "liverpool"), location = c("central", "north", "south", "north", "south", "east", "greater", "north", "east", "west", "central", "east")), class = "data.frame", row.names = c(NA, -12L))