Search code examples
rdataframealgorithmgroupingigraph

How to use igraph to reshape data by combining common value of two variable?


The sample dataset is as follows:

var1 var2 var3
a 1 2
b 2 3

I want to link the record that with var1 = a to var1 = b, if var2 of var1 = a = var3 of var1 = b.

So that the sample dataset will become something like this:

var1 var2 var3
a 1 3

Also, if the next row after var1 =b is also b, then the record will also be linked, for example:

var1 var2 var3
a 1 2
b 2 3
b 3 5
b 7 9
c 5 9

My desired result:

var1 var2 var3
a 1 5
b 7 9
c 5 9

Are there any ways to do this? Thank you!


According to zx8754's comment, igraph can be used to perform data cleaning for this question. However, when I tried to use

library(igraph)

df = structure(list(var1 = c("a", "b", "b", "b", "c"), var2 = c(1L,
                                                           2L, 3L, 7L, 5L), var3 = c(2L, 3L, 5L, 9L, 9L)), class = "data.frame", row.names = c(NA,
                                                                                                                                               -5L))

g <- graph_from_data_frame(df)

The graph only showed var2 and ignored var3:

Question

  1. How to connect a and b if their names are not the same?
  2. How to add one more variable (var3) in the graph?

Solution

  • df %>%
       group_by(grp = cumsum(var2 != lag(var3, default = FALSE))) %>%
       summarise(var1 = first(var1), var2 = first(var2), var3=last(var3))
    
    # A tibble: 3 × 4
        grp var1   var2  var3
      <int> <chr> <int> <int>
    1     1 a         1     5
    2     2 b         7     9
    3     3 c         5     9