Search code examples
rsumextractmultiple-columns

In R, How do you extract columns when you don't know how many columns there are in all datasets?


I have a list of 52 datasets and I am trying to get column sums for a specified number of columns from each dataset and export it to a new dataframe. I know I want to sum everything in column 9 and afterwards but the total number of columns varies between each dataset. ("locs" is my list of dataframes)

Here is what I have tried using a for loop:

summaryofsums <- vector("list",1) #empty vector

for (df in 1:length(locs)){
  newdf <- df[, colSums(df!= 0) > 0] #get rid of all columns that have only 0s
  newdfsum <- colSums(newdf[,9:length(newdf)])  
  summaryofsums[i] <- newdfsum
}

I receive the following error:

Error in colSums(df != 0) : 
  'x' must be an array of at least two dimensions

version _
platform x86_64-apple-darwin15.6.0
arch x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major 3
minor 5.3
year 2019
month 03
day 11
svn rev 76217
language R
version.string R version 3.5.3 (2019-03-11) nickname Great Truth

Thank you!!


Solution

  • Using sapply :

    sapply(locs, function(df) {
      newdf <- df[, colSums(df!= 0, na.rm = TRUE) > 0]
      colSums(newdf[,9:ncol(newdf)], na.rm = TRUE)  
    }) -> result
    
    result