I have a table which looks the following way:
Name | Region | Id |
---|---|---|
Name1 | US | 123 |
Name1 | US | |
Name2 | US | 122 |
Name3 | US | 124 |
Name1 | UK | |
Name1 | UK | 135 |
Name2 | UK | 140 |
Name3 | US |
As you can see there are empty values in the ID column which I want to be filled with the values of other rows if the column Name and Region correspond. So it should look like this:
Name | Region | Id |
---|---|---|
Name1 | US | 123 |
Name1 | US | 123 |
Name2 | US | 122 |
Name3 | US | 124 |
Name1 | UK | 135 |
Name1 | UK | 135 |
Name2 | UK | 140 |
Name3 | US |
Some of the columns will still be left empty but the majority will be fine. Is there a way to do so in R? Maybe it will be easier to so in Excel even (Vlookup maybe I am not sure). Share if you have any ideas)
The order of rows is important here Eg - Name3 US in row 4 and row 8 are not related and treated differently. We can create a new group using consecutive_id
and then fill
the Id
values for each group.
library(dplyr)
library(tidyr)
df %>%
mutate(group_id = consecutive_id(Name, Region)) %>%
group_by(group_id) %>%
fill(Id, .direction = "downup") %>%
ungroup
# A tibble: 8 × 4
# Name Region Id group_id
# <chr> <chr> <int> <int>
#1 Name1 US 123 1
#2 Name1 US 123 1
#3 Name2 US 122 2
#4 Name3 US 124 3
#5 Name1 UK 135 4
#6 Name1 UK 135 4
#7 Name2 UK 140 5
#8 Name3 US NA 6
group_id
column is just for explanation purposes on how groups are created, you may drop the column if you don't need it.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(Name = c("Name1", "Name1", "Name2", "Name3", "Name1",
"Name1", "Name2", "Name3"), Region = c("US", "US", "US", "US",
"UK", "UK", "UK", "US"), Id = c(123L, NA, 122L, 124L, NA, 135L,
140L, NA)), row.names = c(NA, -8L), class = "data.frame")