Search code examples
rdataframecountnested-loops

Search a certain value in multiple dataframes and storage records


I believe this one is a complex problem and I will try to make it easy to understand.

I have 3 dataframes such as:

NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))

NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))

names(NS_3)<-c("NS", "Pred FAILCODE TEST")
names(NS_4)<-c("NS", "Pred FAILCODE TEST")
names(NS_15)<-c("NS", "Pred FAILCODE TEST")

image of the three dataframes

What I would like to do is to:

1) Check if dataframes NS_4 and NS_15 contain the values of each row of NS_3$Pred FAILCODE TEST.

2) If this value is present in a certain dataframe, then it should count and storage all the values of the Pred FAILCODE TESTof this dataframe, except for the value found.

For instance: For the first Pred FAILCODE TEST value in NS_3, check if 341007 is present in NS_4 and NS_15.

Once this check is TRUE in NS_4, then it should count the frequency of all the NS_4$Pred FAILCODE TEST values, except for the one in question (which is 341007).

Hence, the result for the first loop should be

Results for the first loop 341007

For the second and third value of NS_3$Pred FAILCODE TEST, as neither 325001 nor 324003 appear at any dataframe, they should not be considered.

For the fourth value 524302, the result should be something like:

FAILCODES 524302
341007    1
270001    2
521009    1
346001    1
335104    1

Once the loop ends with NS_3$Pred FAILCODE TEST values, then it should do the same thing to NS_4$Pred FAILCODE TEST values, searching them in NS_3 and NS_15. After done with NS_4, it should do the same with NS_15, searching the NS_15$Pred FAILCODE TEST values at NS_3 and NS_4.

I believe it will require nested for-loops to go through each row of each dataframe. Also, dflist<-list(df1=NS_3,df2=NS_4,df3=NS_15) would be probably helpful in those loops.

Actually I have around 70 different dataframes and 50 different Pred FAILCODE TEST values to check in each dataframe.

I hope it is clear, if you guys need more infos please let me know!


Solution

  • Think this does it,

    #your code
    NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
    NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))
    NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))
    
    names(NS_3)<-c("NS", "Pred FAILCODE TEST")
    names(NS_4)<-c("NS", "Pred FAILCODE TEST")
    names(NS_15)<-c("NS", "Pred FAILCODE TEST")
    
    #Make a vector of your Tables suffixes
    df_index <- c(3,4,15)
    
    #Essentially rbind() all of tables in your df_index 
    #there is probably an elegant way to do this with do.call()
    input <- eval(parse(text = paste0("rbind(",  
                         paste0("NS_", df_index, collapse = ","), 
                         ")")
           )
     )
    
    require(dplyr)
    require(magrittr)
    
    #convert from factor to numeric
    input$`Pred FAILCODE TEST` <- as.numeric(as.character(input$`Pred FAILCODE TEST`))
    input$NS <- as.numeric(as.character(input$NS))
    
    #make a compressed table of frequencies
    input %>% group_by(NS, `Pred FAILCODE TEST`) %>% 
    summarize(n=n()) -> compressTBL
    
    #little function to look up each record and compare
    Lookup <- function(NS, FailCode){
      input$NS[input$`Pred FAILCODE TEST` == FailCode & !input$NS == NS]
    }
    
    #the output, a list, each column is row in your input table
    output <- sapply(X = 1:nrow(input), 
       FUN = function(x){
       compressTBL[compressTBL$NS == Lookup(input$NS[x], input$`Pred FAILCODE TEST`[x]),]
       })
    
    #The only records with values are 1,4,6,11
    output
    
    #same as what you got in your loop
    as.data.frame(output[,4]) #4th record 524302