Search code examples
rlistdataframesequencesgephi

Pulling Pairs of Factors from a within a list into a 2-column dataframe in R


I'm new to r and am trying to put pairs of factors that are side by side in a list into a dataframe so that I can export them as edges for GEPHI. I am trying to create a dataset that acts like a shopping list for each individual users journey where each edge would be a journey from one shopping point to another.

Here is sample data that I am testing on:

a <- c("a","a","a","b","b","a","a","b","a","a","c","d","c")
b <- c(12,22,44,22,33,55,33,66,88,55,33,66,77)
a1 <- data.frame(a,b)
b1 <- tapply(a1$b, a1$a, list)

Which looks like this:

$a
[1] 12 22 44 55 33 88 55

$b
[1] 22 33 66

$c
[1] 33 77

$d
[1] 66

Hence, "$a, $b, $c, $d" would be individual users and the lists within would be there transaction journeys. I want the first row to be "12 22" then second be "22 44"... ect with the last being "33 77".

So far I have created the function called "pairsfunction" and have tried to use lapply with it but it doesn't seem to work.

Here is what I have so far:

pairingfunction <- function(x) {
  pairdf <- data.frame()
  for (i in 1:(length(x)-1)){  
    a <- x[i] 
    b <- x[(i+1)]
    pairdf[(nrows(pairdf)+1)] <- a
    pairdf[(nrows(pairdf))] <- b
  } return(pairdf)
}

lapply(b1, pairingfunction)

If someone could help fix the function or let me know a better way than using lapply that would be fantastic. Thanks


Solution

  • You could leverage the nest() function from the tidyr package:

    library(tidyr)
    library(dplyr)
    
    a <- c("a","a","a","b","b","a","a","b","a","a","c","d","c")
    b <- c(12,22,44,22,33,55,33,66,88,55,33,66,77)
    df <- data.frame(user = a, touchpoint = b)
    
    df %>% nest(touchpoint)
    
    #   user                       data
    # 1    a 12, 22, 44, 55, 33, 88, 55
    # 2    b                 22, 33, 66
    # 3    c                     33, 77
    # 4    d                         66