Search code examples
rlistapplylapplysapply

Remove elements from a list by condition


I have a list of 108 dataframes, let's say it's called "LDF", and in this list all objects have the same column "VALUE", among others. What I need to do is say to R:

if SUM(VALUE) for each DF of list is greater than 0, maintain this element on the list, otherwhise, drop it.

Basicaly I should have like 104 dataframes in the end of the process

I'm avoiding using for loop. Can someone think of a solution using APPLY?

Was trying:

LDF <- LDF[sapply(LDF$Value, sum) > 0]

but got a 'List of 0' as result

sampled df:

LDF <- list(structure(list(Date = structure(c(18765, 18767, 18778, 18778, 
18779, 18787, 18795, 18809, 18809, 18809, 18820, 18821, 18848, 
18864, 18871, 18880, 18885, 18886), class = "Date"), Value = c(120000, 
40000, 55000, -11.38, -115091.86, 30000, 98400, 1720, 50000, 
-50062.58, -2502.82, -20021.71, 28619.27, 45781.12, 14953.83, 
-6017.31, -3310.73, -140372.91)), row.names = c(NA, -18L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(Date = structure(c(18820, 
18820, 18820, 18820, 18820, 18821, 18857, 18857, 18857, 18857, 
18857, 18857, 18858, 18871, 18871, 18887, 18887, 18890, 18890
), class = "Date"), Value = c(41000, 41000, 122754.88, 41000, 
41000, 82000, -41080.42, -41432.51, -160308.38, -120504.54, -37214.87, 
-76707.98, -42592.41, -41248.63, -41824.33, -120572.42, -37472.61, 
-79312, -34830.47)), row.names = c(NA, -19L), class = c("tbl_df", 
"tbl", "data.frame")))

Solution

  • We need to extract the column within the loop. LDF is a list of data.frame/tibble, thus LDF$Value doesn't exist

    i1 <- sapply(LDF, function(x) sum(x$Value)) > 0
    LDF[i1]
    

    -output

    [[1]]
    # A tibble: 18 x 2
       Date           Value
       <date>         <dbl>
     1 2021-05-18  120000  
     2 2021-05-20   40000  
     3 2021-05-31   55000  
     4 2021-05-31     -11.4
     5 2021-06-01 -115092. 
     6 2021-06-09   30000  
     7 2021-06-17   98400  
     8 2021-07-01    1720  
     9 2021-07-01   50000  
    10 2021-07-01  -50063. 
    11 2021-07-12   -2503. 
    12 2021-07-13  -20022. 
    13 2021-08-09   28619. 
    14 2021-08-25   45781. 
    15 2021-09-01   14954. 
    16 2021-09-10   -6017. 
    17 2021-09-15   -3311. 
    18 2021-09-16 -140373. 
    

    To check the elements that are deleted, negate (!) the logical vector and check

    which(!i1)
    

    gives the position

    LDF[!i1]
    

    Or may use Filter as well

    Filter(\(x) sum(x$Value) >0, LDF)
    

    Or with keep from purrr

    library(purrr)
    keep(LDF, ~ sum(.x$Value) > 0)
    

    Or the opposite is discard

    discard(LDF, ~ sum(.x$Value) > 0)