I have a dataframe regarding kidney transplant patients with different clinical outcomes (numbers changed for confidentiality purposes. In other words I have something like this.
Patient eGFR1m cr1m alb1m cr3m eGFR3m alb3m cr12m eGFR12m Diseased
A 142 343 125 110 115 125 120 181 1
B 175 192 121 125 215 120 135 151 0
C 154 185 128 210 115 125 124 116 0
D 170 215 215 110 125 110 145 205 1
E 175 140 225 110 115 110 125 120 0
This is the simplified version. I have a lot more outcomes so I want to create a loop for calculating median and IQR for each column in R.
Another thing is that I need the medians for the cohort, as well as medians for a diseased group and non-diseased group as comparisons. The disease outcome was collected as binary, non-continous variable. eGFR, cr, alb at each month are all continous, non-parametric variables.
it seems you want us to do all the steps of an initial exploratory data analysis for you. On your next postings, instead of requesting coding like this, you should first show your problems with reproducible code, show the results of your attempts, and ask specific questions about your doubts. That said, lets look at your question:
You can use apply in loops to return median, mean, Q1 and Q3 for every column.
sapply(yourdataframe, median) #will return a vector with the medians of every column
Similarly,
sapply(yourdataframe, quantile, 0.25) #will return a vector with all the first quartiles
sapply(yourdataframe, quantile, 0.75) #will return a vector with all the third quartiles
You may want to write a function that integrates all that in a single call, like this:
descriptive<-function(x=data.frame(), digits=2, na.rm=TRUE, normality_test="shapiro"){
library(stats)
is.normal<-character()
medians<-numeric()
Q1<-numeric()
Q3<-numeric()
means<-numeric()
SDs<-numeric()
output<-character()
for (i in seq_along(x)){
if (is.numeric(x[,i])){
medians[i]<-median(x[,i], na.rm = na.rm)
Q1[i]<-quantile(x[,i], 0.25, na.rm = na.rm)
Q3[i]<-quantile(x[,i], 0.75, na.rm = na.rm)
means[i]<-round(mean(x[,i], na.rm = na.rm), digits = digits)
SDs[i]<-round(sd(x[,i], na.rm=TRUE), digits = digits)
if (normality_test=="shapiro"){
p.value<-shapiro.test(x[,i])$p.value
} else if (normality_test=="ks"){
p.value<-ks.test(x[,i], "pnorm", means[i], SDs[i])$p.value
}
if (p.value<=0.05){
is.normal[i]<-FALSE
output[i]<-paste0(medians[i], " (", Q1[i], "-", Q3[i], ")")
}else{
is.normal[i]<-TRUE
output[i]<-paste0(means[i], " +-", SDs[i])
}
}else {
is.normal[i]<-NA
means[i]<-NA
medians[i]<-NA
Q1[i]<-NA
Q3[i]<-NA
SDs[i]<-NA
output[i]<-NA
}
}
df<-data.frame(rbind( "normal distr"=is.normal, "median"=medians, "Q1"=Q1, "Q3"=Q3, "mean"=means, "SD"=SDs, "output"=output))
names(df)<-colnames(x)
df
}
As an example:
> descriptive(iris, normality_test="shapiro")
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
normal distr FALSE TRUE FALSE FALSE <NA>
median 5.8 3 4.35 1.3 <NA>
Q1 5.1 2.8 1.6 0.3 <NA>
Q3 6.4 3.3 5.1 1.8 <NA>
mean 5.84 3.06 3.76 1.2 <NA>
SD 0.83 0.44 1.77 0.76 <NA>
output 5.8 (5.1-6.4) 3.06 +-0.44 4.35 (1.6-5.1) 1.3 (0.3-1.8) <NA>
There are several ways to subset your data based on categorical values for analysis, check dplyr's filter and group_by functions.