Search code examples
rpurrr

R Map returning 2 different data.frame, using by bind_rows


I have a list of .csv files that I am trying to filter one by one (I can't filter when regrouped because I have too much data to load it at the same time).

I want :

  • To filter my data, then aggregate it in one single data.frame
  • To have a count of the data I filtered

Here is a (fake) example of my data :

library(tidyverse)
df_list=data.frame(a=seq(1,20,1), b=seq(41,60,1), c=seq(81,100,1)) %>% map(~{ 
  data.frame( a=.x, b=.x*2, c=.x*3)})

I then managed to do :

regrouped_data=df_list %>% map(~{
# Filter
  d2=.x %>% filter(a>5) 
# Count
  print(
    tribble(~date,~initial,~final,
            "name",nrow(.x),nrow(d2)
            )
  )
  return(d2)
}) %>% bind_rows()

The problem is : I need all the data.table to be assembled in one (because I have a lot of files to filter). How can I do that ?


Solution

  • Can be nice to lay everything out so the logic is clear in a straightforward loop:

     filterCount <- function(){
      
      for(i in 1:length(df_list)){
        
        data_flt <- df_list[[i]] %>% 
          filter(a>5)
        
        count_flt <- tibble(date = i,
                            nrow.total = nrow(df_list[[i]]),
                            nrow.flt = nrow(data_flt))
        
        if(i == 1){
          
          data_out <- data_flt
          count_out <- count_flt
          
        } else {
          
          data_out <- bind_rows(data_out, data_flt)
          count_out <- bind_rows(count_out, count_flt)
          
        }
        
      }
    
      return(list(data_out, count_out))
      
    }