Search code examples
rdataframedplyrstatisticssummary

Rowwise statistics of a dataframe containing several columns with NAs in R


Overview of Problem

Hello R experts,

I will appreciate your support in solving this.I am trying to compute row calculations of a large data frame -15k rows, 700 columns which include NAs. I want to calculate the following to represent new columns:Min,Max,Mean,Median,Standard deviaton,variance, 10th percentile,30th percentile,70th perventile,90th percentile

Where there are NA's the computation should skip them. Used na.rm = True in dplyr's rowsie command to no success.

Code to Load subset of Dataframe

#Please note that the real dataframe has hundreds of columns, so typing each column won't be possible

df<- data.frame(a1=c(1,NA,0,4), a2=c(NA,1,0,6), a3=c(NA,NA,9,3),a4=c(1,NA,NA,4), a5=c(4,NA,NA,6), a6=c(7,NA,9,3),a7=c(1,1,1,1),a8=c(2,2,2,2), a9=c(4,3,3,6), a10=c(7,4,9,3))
df
  a1 a2 a3 a4 a5 a6 a7 a8 a9 a10
1  1 NA NA  1  4  7  1  2  4   7
2 NA  1 NA NA NA NA  1  2  3   4
3  0  0  9 NA NA  9  1  2  3   9
4  4  6  3  4  6  3  1  2  6   3

Expected Output

I will like to get statistics as mentioned above for each row. I get errors when using dplyr's rowwise computation due to the NAs despite using argumemnt "na.rm = True"

df
  a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 min mean median max sd variance per10 per30 per70 per90
1  1 NA NA  1  4  7  1  2  4   7   x    x      x   x  x        x     x     x     x     x
2 NA  1 NA NA NA NA  1  2  3   4   x    x      x   x  x        x     x     x     x     x
3  0  0  9 NA NA  9  1  2  3   9   x    x      x   x  x        x     x     x     x     x
4  4  6  3  4  6  3  1  2  6   3   x    x      x   x  x        x     x     x     x     x

Thanks in anticipation for your help and in growing the R community


Solution

  • 
    df$min<-apply(df,1,min,na.rm=TRUE)
    df$mean<-apply(df,1,mean,na.rm=TRUE)
    df$median<-apply(df,1,median,na.rm=TRUE)
    df$max<-apply(df,1,max,na.rm=TRUE)
    df$sd<-apply(df,1,sd,na.rm=TRUE)
    df$variance<-apply(df,1,var,na.rm=TRUE)
    df$per10<-apply(df,1,quantile,probs=0.1,na.rm=TRUE)
    df$per30<-apply(df,1,quantile,probs=0.3,na.rm=TRUE)
    df$per70<-apply(df,1,quantile,probs=0.7,na.rm=TRUE)
    df$per90<-apply(df,1,quantile,probs=0.9,na.rm=TRUE)
    

    of course you could iterate with "eval(parse..)" over the vector of the functions for the same result and less code