Search code examples
rdataframesummary

Get the fraction of values that's less than a constant for all columns in data frame


I think there should be a really easy way to do this but I'm not that skilled in R yet. My dumb solution is just to iterate through the columns, count the number of values that's less than C, divide it by the number of rows to get the fraction. Something like

fracs <- c()
for (j in 1:ncol(df)) {
  frac <- sum(df[,j] < C) / nrow(df)
  fracs <- c(fracs, frac)
}

I feel there should be a one-liner to get this kind of summary for a data frame, perhaps using dplyr or something. Ideally the output is a data frame of one row with the fractions. R masters please help.


Solution

  • I think this will be very simple. I think you just want colMeans(df < C)

    Edit: just to be a little more clear, df is a dataframe. When we run df < C, the results will be a logical matrix of the same dimensions, where each cell, i,j, stores the result of df[i,j] > C. colMeans takes the column means, treating TRUE and FALSE as 1 and 0, respectively.