Search code examples
rlistdataframeequivalence-classes

Finding patterns in a list of dataframes using R


I have a list L of dataframes where each dataframe consists of the variable Var and one observation that consists of different numbers. Each number in each observation belongs to the set {1,2,3,4,5,11,12,13,14,15}. L could, as an example, look like:

> L 
[[1]]
                   Var
1 "3", "11", "1", "15", 

[[2]]
                   Var
1 "5", 13", "3", "12", 

[[3]]
                   Var
1 "4", "1", "2", "5", 

The problem I am trying to solve is the following. I want to know if there is a positive number x = {1,2,3,4,5} such that when added to each number in a given observation, that observation becomes equivalent to another. For example, consider the first 2 elements of L and let x=2, then, adding x to the first element of L yields:

> L[[1]]
                   Var
1 "5", "13", "3", "17", 

The number 17 does not meet the conditions of the set defined above. I want the following constraints to apply on x. Let y denote a number in an obs. in a dataframe of L:

if y + x > 15 then subtract 5
if 5 < y + x < 11 then subtract 5

The same example with these constraints would yield:

> L[[1]]
                   Var
1 "5", "13", "3", "12", 

And L[[1]] would become equivalent to L[[2]]. In my world, L[[1]] and L[[2]] share the same pattern. What I want to do is to match elements of L based on equivalent (in the sense described above) patterns and sort the groups according to "the number of members". So in the example here I'd like to detect that L[[1]] and L[[2]] are in one group and that this is the group with most members, followed by the next group, that in this example only consists of L[[3]]. I am very new to R and any guidance would be appreciated!


Solution

  • It looks like your "constraints" define a mathematical equivalence relation. That means that your groups are really equivalence classes in the mathematical sense and that you can define a unique representative for each group. If you do this, you can easily check for equivalence (= elements belonging to the same group) by comparing their representatives.

    Let's define the representative as the element in your equivalence class that starts with "1", i.e., for each list element we add the integer in 1:5, following your defined constraints, so that the first element equals one. We can do that for every element in your list L and then compare which elements have the same representative.

    Implementation in R:

    Let's start with your list L:

    L <- list(structure(list(Var = c("3", "11", "1", "15")), .Names = "Var", 
                    row.names = c(NA, -4L), class = "data.frame"), 
            structure(list(Var = c("5", "13", "3", "12")), .Names = "Var", 
                    row.names = c(NA, -4L), class = "data.frame"), 
            structure(list(Var = c("4", "1", "2", "5")), .Names = "Var", 
                    row.names = c(NA, -4L), class = "data.frame"))
    

    First, we simplify the list by converting it into a list of numerical vectors:

    ## Simplify list: convert to list of numerical vectors:
    L2 <- lapply(L, function(x) as.numeric(x$Var))
    
    > L2
    [[1]]
    [1]  3 11  1 15
    
    [[2]]
    [1]  5 13  3 12
    
    [[3]]
    [1] 4 1 2 5
    

    Then we define the function to perform the addition, following your constraints and find the representative for each element:

    ## Function to implement the addition rules:
    addConstant <- function(myVec, constant){
        outVec <- myVec + constant  
        outVec <- ifelse(((outVec > 5) & (outVec < 11)) |(outVec > 15),
            outVec - 5, outVec) 
    }
    
    ## Define representative of equivalence class as the one starting with a "1":
    representativesList <- lapply(L2, function(myVec) addConstant(myVec, 6 - myVec[1]))
    
    > representativesList 
    [[1]]
    [1]  1 14  4 13
    
    [[2]]
    [1]  1 14  4 13
    
    [[3]]
    [1] 1 3 4 2
    

    Now we can define the groups, in your example there are two groups. We will call them group1 and group2:

    ## Define groups: Unique representatives:
    groupList <- unique(representativesList)
    names(groupList) <- paste0("group", seq(along = groupList))
    
    > groupList
    $group1
    [1]  1 14  4 13
    
    $group2
    [1] 1 3 4 2
    

    Lastly, we check which group each observation belongs to:

    ## Find group:
    groupAffiliationVec <- vapply(representativesList, function(x){
                flagVec <- vapply(groupList, function(y, x) identical(x,y), logical(1), x)
                names(groupList[flagVec])           
            }, character(1))
    
    > groupAffiliationVec
    [1] "group1" "group1" "group2"
    

    We know now that observations 1 and 2 belog to the same group (group1) and that observation 3 belongs to group1. Using table(groupAffiliationVec), you can compute the number of members for each group.