Overview of Problem
Hello R experts,
I will appreciate your support in solving this.I am trying to compute row calculations of a large data frame -15k rows, 700 columns which include NAs. I want to calculate the following to represent new columns:Min,Max,Mean,Median,Standard deviaton,variance, 10th percentile,30th percentile,70th perventile,90th percentile
Where there are NA's the computation should skip them. Used na.rm = True in dplyr's rowsie command to no success.
Code to Load subset of Dataframe
#Please note that the real dataframe has hundreds of columns, so typing each column won't be possible
df<- data.frame(a1=c(1,NA,0,4), a2=c(NA,1,0,6), a3=c(NA,NA,9,3),a4=c(1,NA,NA,4), a5=c(4,NA,NA,6), a6=c(7,NA,9,3),a7=c(1,1,1,1),a8=c(2,2,2,2), a9=c(4,3,3,6), a10=c(7,4,9,3))
df
a1 a2 a3 a4 a5 a6 a7 a8 a9 a10
1 1 NA NA 1 4 7 1 2 4 7
2 NA 1 NA NA NA NA 1 2 3 4
3 0 0 9 NA NA 9 1 2 3 9
4 4 6 3 4 6 3 1 2 6 3
Expected Output
I will like to get statistics as mentioned above for each row. I get errors when using dplyr's rowwise computation due to the NAs despite using argumemnt "na.rm = True"
df
a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 min mean median max sd variance per10 per30 per70 per90
1 1 NA NA 1 4 7 1 2 4 7 x x x x x x x x x x
2 NA 1 NA NA NA NA 1 2 3 4 x x x x x x x x x x
3 0 0 9 NA NA 9 1 2 3 9 x x x x x x x x x x
4 4 6 3 4 6 3 1 2 6 3 x x x x x x x x x x
Thanks in anticipation for your help and in growing the R community
df$min<-apply(df,1,min,na.rm=TRUE)
df$mean<-apply(df,1,mean,na.rm=TRUE)
df$median<-apply(df,1,median,na.rm=TRUE)
df$max<-apply(df,1,max,na.rm=TRUE)
df$sd<-apply(df,1,sd,na.rm=TRUE)
df$variance<-apply(df,1,var,na.rm=TRUE)
df$per10<-apply(df,1,quantile,probs=0.1,na.rm=TRUE)
df$per30<-apply(df,1,quantile,probs=0.3,na.rm=TRUE)
df$per70<-apply(df,1,quantile,probs=0.7,na.rm=TRUE)
df$per90<-apply(df,1,quantile,probs=0.9,na.rm=TRUE)
of course you could iterate with "eval(parse..)" over the vector of the functions for the same result and less code