The sample dataset is as follows:
var1 | var2 | var3 |
---|---|---|
a | 1 | 2 |
b | 2 | 3 |
I want to link the record that with var1 = a to var1 = b, if var2 of var1 = a = var3 of var1 = b.
So that the sample dataset will become something like this:
var1 | var2 | var3 |
---|---|---|
a | 1 | 3 |
Also, if the next row after var1 =b
is also b
, then the record will also be linked,
for example:
var1 | var2 | var3 |
---|---|---|
a | 1 | 2 |
b | 2 | 3 |
b | 3 | 5 |
b | 7 | 9 |
c | 5 | 9 |
My desired result:
var1 | var2 | var3 |
---|---|---|
a | 1 | 5 |
b | 7 | 9 |
c | 5 | 9 |
Are there any ways to do this? Thank you!
According to zx8754's comment, igraph
can be used to perform data cleaning for this question. However, when I tried to use
library(igraph)
df = structure(list(var1 = c("a", "b", "b", "b", "c"), var2 = c(1L,
2L, 3L, 7L, 5L), var3 = c(2L, 3L, 5L, 9L, 9L)), class = "data.frame", row.names = c(NA,
-5L))
g <- graph_from_data_frame(df)
The graph only showed var2
and ignored var3
:
Question
a
and b
if their names are not the same?df %>%
group_by(grp = cumsum(var2 != lag(var3, default = FALSE))) %>%
summarise(var1 = first(var1), var2 = first(var2), var3=last(var3))
# A tibble: 3 × 4
grp var1 var2 var3
<int> <chr> <int> <int>
1 1 a 1 5
2 2 b 7 9
3 3 c 5 9