I have a list of 40 dataframes. Each dataframe in the list has the same variables but different number of observations. I would like to cut out some of the early rows and late rows of each dataframe, but it will be a different number of rows that are cut for each, which I will have as vectors of the row indices. That is, I would like to subset from row X to row Y in each dataframe: say, leave me with row 5 to 20 in the first datafrane, row 7 to 12 in the second dataframe, and row 1 to 18 in the third dataframe, etc.
I am hoping for a way to accomplish this easily using lapply/mapply or perhaps purr or other tidyverse solutions.
Here is a reproducible example along with what I've tried and where I've gotten stuck:
# This creates a list of 3 dataframes, each of which are 32 rows of 11 variables
listexample <- list(mtcars,mtcars,mtcars)
# I would like to subset each of the dataframes to only a certain number of the rows. This line works to subset all 3 of the dataframes to the last 20 rows of 11 variables
listexample_short <- lapply(listexample,tail,20)
So far so good. But I haven't been able to cut the rows differently for each dataframe.
# However, what I actually want is to subset each of the three lists differently. So, let's say, the final 20, 10 and 12 rows of the 3 dataframes
cutlength <- c(20,10,12)
These are the things I've tried to achieve this outcome, and the error/outcome is listed below:
listexample_short <- lapply(listexample,tail,cutlength)
# Error in tail.data.frame(X[[i]], ...) :
# invalid 'n' - length(n) must be <= length(dim(x)), got 3 > 2
listexample_short <- lapply(listexample, function(x) x[1:cutlength, ])
# this cuts every dataframe to rows 1:20, not 1:20, 1:10, and 1:12
# Warning messages:
# 1: In 1:cutlength :
# numerical expression has 3 elements: only the first used
# 2: In 1:cutlength :
# numerical expression has 3 elements: only the first used
# 3: In 1:cutlength :
# numerical expression has 3 elements: only the first used
listexample_short <- mapply(tail,listexample,cutlength)
# this created a list of 33 (11 rows with vectors of varying lengths for each of 3 columns, with each column being one of the 3 dataframes from the original list)
listexample_short <- mapply(tail,listexample,list(90,100,10))
# this created a list of 33 (11 rows with vectors of varying lengths for each of 3 columns, with each column being one of the 3 dataframes from the original list)
Further, I actually don't only want to take the tail, as I mentioned above I would actually like to cut off rows at the beginning of each dataframe as well. I'm also unsure of how to go about doing that.
Thank you in advance for any help!
You can use mapply
. Just define a list containing the rows that you want to keep.
cutlengths <- list(3:4, 4:6, 5:7)
mapply(\(data, cuts) data[cuts,],
data=listexample, cuts=cutlengths, SIMPLIFY = FALSE)
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
[[3]]
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
We need to specify "SIMPLIFY=FALSE" in mapply
, otherwise it would return a matrix.