I have a loop that goes through a dataframe, runs ttests and stores the resulting p-value of each ttest in another dataframe.
Here is the loop where 'mydata' is the dataframe that the ttests are being run on. 'mydata' is a dataframe with 4 columns:
df <- mydata
mydf <- data.frame(c(1:4))
# this is the new dataframe being initialized to store my p-values
row.names(mydf) <- names(df)
for(i in names(df)){
if(sd(df[[i]]) == 0) {
# this prevents the loop from terminating and returning an error when ttests
# are run on columns with binary values
} else {
ttest <- t.test(df[df$Pre==1,][[i]], df[df$Pre==2,][[i]], paired=FALSE)
# 'Pre' is the column that groups my data into
# distinct cohorts. I am comparing the Pre cohort versus the Post cohort
# in these ttests.
mydf[i,1] <- ttest$p.value
}
}
mydf
Here is my output of mydf for an unpaired (paired=FALSE) ttest:
c.1.4.
density 0.3569670
clust 0.9715987
Pre 3.0000000
HC 4.0000000
However, when I change paired=FALSE to paired=TRUE (to run a paired ttest), here is mydf:
c.1.4.
density 1
clust 2
Pre 3
HC 4
I checked this line of my loop in isolation using the first column of my dataframe, '1' in double brackets,(for paired=TRUE) and it does appear to be outputting a p-value:
ttest <- t.test(df[df$Pre==1,][[1]], df[df$Pre==2,][[1]], paired=TRUE)
ttest$p.value
[1] 0.356967
Below is a sample dataset that you can use to reproduce the error:
density clust Pre HC
RDHC008A_13 0.47991 0.676825 1 1
RDHC009A_13 0.49955 0.696441 1 1
RDHC010A_16 0.491454 0.706507 1 1
RDHC013A_13 0.442879 0.689118 1 1
RDHC014A_13 0.453823 0.691603 1 1
RDHC016A_16 0.481259 0.706978 1 1
RDHC019A_06 0.515442 0.699514 1 1
RDHC021A_15 0.449925 0.685202 1 1
RDHC022A_12 0.461319 0.705446 1 1
RDHC023A_11 0.468816 0.667698 1 1
RDHC024A_12 0.515142 0.719474 1 1
RDHC025A_13 0.496702 0.710877 1 1
RDHC026A_12 0.477061 0.695061 1 1
RDHC027A_12 0.515442 0.722269 1 1
RDHC029A_12 0.406747 0.669998 1 1
RDHC030A_12 0.476162 0.69219 1 1
RDHC032B_13 0.50075 0.685474 1 1
RDHC034B_07 0.525487 0.725558 1 1
RDHC036B_07 0.468816 0.698904 1 1
RDHC038B_07 0.470015 0.706668 1 1
RDHC039B_07 0.511544 0.712818 1 1
RDHC041A_14 0.551574 0.732983 1 1
RDHC004C_12 0.486207 0.695121 2 1
RDHC005C_12 0.505997 0.695598 2 1
RDHC006C_13 0.487406 0.697044 2 1
RDHC013C_12 0.41979 0.685518 2 1
RDHC015C_13 0.297751 0.69632 2 1
RDHC016C_16 0.463718 0.700011 2 1
RDHC019C_14 0.508096 0.690071 2 1
RDHC021C_12 0.448426 0.688265 2 1
RDHC022C_12 0.468816 0.700968 2 1
RDHC024C_12 0.515292 0.70664 2 1
RDHC025C_13 0.473163 0.704231 2 1
RDHC027C_12 0.518741 0.732939 2 1
RDHC030C_11 0.489205 0.708174 2 1
You can import it by doing the following:
copy the data and paste it within the quotation marks of the code below into R:
zz <- ""
now, assign the data to a data.frame:
mydata <- read.table(text=zz, header=TRUE)
I have no idea why changing the 'paired' parameter to TRUE would cause this to happen. Any help/advice would be much appreciated. Thanks - Paul
You initialize the mydf
data.frame with the values 1:4 here
mydf <- data.frame(c(1:4))
basically the loop does nothing because t.test
is throwing an error when you do PAIRED=TRUE
because your two sets of values aren't the same length (and they need to be when doing a paired t-test. You have 22 values where Pre==1 and 13 values where Pre==2. You can't do a paired test with an imbalance like that.