Search code examples
rlisttidyverselapplymapply

R: subset a different set of rows (based on index) from each dataframe in a list using lapply/mapply or tidyverse


I have a list of 40 dataframes. Each dataframe in the list has the same variables but different number of observations. I would like to cut out some of the early rows and late rows of each dataframe, but it will be a different number of rows that are cut for each, which I will have as vectors of the row indices. That is, I would like to subset from row X to row Y in each dataframe: say, leave me with row 5 to 20 in the first datafrane, row 7 to 12 in the second dataframe, and row 1 to 18 in the third dataframe, etc.

I am hoping for a way to accomplish this easily using lapply/mapply or perhaps purr or other tidyverse solutions.

Here is a reproducible example along with what I've tried and where I've gotten stuck:

# This creates a list of 3 dataframes, each of which are 32 rows of 11 variables
listexample <- list(mtcars,mtcars,mtcars)

# I would like to subset each of the dataframes to only a certain number of the rows.  This line works to subset all 3 of the dataframes to the last 20 rows of 11 variables
listexample_short <- lapply(listexample,tail,20)

So far so good. But I haven't been able to cut the rows differently for each dataframe.

# However, what I actually want is to subset each of the three lists differently.  So, let's say, the final 20, 10 and 12 rows of the 3 dataframes
cutlength <- c(20,10,12)

These are the things I've tried to achieve this outcome, and the error/outcome is listed below:

listexample_short <- lapply(listexample,tail,cutlength)
# Error in tail.data.frame(X[[i]], ...) : 
# invalid 'n' - length(n) must be <= length(dim(x)), got 3 > 2

listexample_short <- lapply(listexample, function(x) x[1:cutlength, ])
# this cuts every dataframe to rows 1:20, not 1:20, 1:10, and 1:12

# Warning messages:
# 1: In 1:cutlength :
#   numerical expression has 3 elements: only the first used
# 2: In 1:cutlength :
#   numerical expression has 3 elements: only the first used
# 3: In 1:cutlength :
#   numerical expression has 3 elements: only the first used

listexample_short <- mapply(tail,listexample,cutlength)
# this created a list of 33 (11 rows with vectors of varying lengths for each of 3 columns, with each column being one of the 3 dataframes from the original list)

listexample_short <- mapply(tail,listexample,list(90,100,10))
# this created a list of 33 (11 rows with vectors of varying lengths for each of 3 columns, with each column being one of the 3 dataframes from the original list)

Further, I actually don't only want to take the tail, as I mentioned above I would actually like to cut off rows at the beginning of each dataframe as well. I'm also unsure of how to go about doing that.

Thank you in advance for any help!


Solution

  • You can use mapply. Just define a list containing the rows that you want to keep.

    cutlengths <- list(3:4, 4:6, 5:7)
    
    mapply(\(data, cuts) data[cuts,], 
           data=listexample, cuts=cutlengths, SIMPLIFY = FALSE)
    

    [[1]]
                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
    Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    
    [[2]]
                       mpg cyl disp  hp drat    wt  qsec vs am gear carb
    Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
    Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
    Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
    
    [[3]]
                       mpg cyl disp  hp drat   wt  qsec vs am gear carb
    Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2
    Valiant           18.1   6  225 105 2.76 3.46 20.22  1  0    3    1
    Duster 360        14.3   8  360 245 3.21 3.57 15.84  0  0    3    4
    

    We need to specify "SIMPLIFY=FALSE" in mapply, otherwise it would return a matrix.