Search code examples
rdistributionfdr

How to find value that corresponds to FDR=0.05 given distribution A and a null distribution?


I have two vectors of correlations: one which represents real correlations, and the other permuted correlations (the null distribution). I want to find the correlation value that corresponds to a FDR of 0.05.

Updated approach:

cor_real=rnorm(1000,0,sd=0.2)
cor_null=rnorm(1000,0,sd=0.15)

d_real=density(cor_real,from=max(min(cor_real),min(cor_null)),to=min(max(cor_real),max(cor_null)))
d_null=density(cor_null,from=max(min(cor_real),min(cor_null)),to=min(max(cor_real),max(cor_null)))
# here we ensure that the x values are comparable between the two densities

plot(d_real)
lines(d_null)

enter image description here

Then, to find the correlation value that corresponds to FDR = 0.05, my guess would be:

ratios=d_null$y/d_real$y
d_real$x[which(round(ratios,2)==.05)]
[1] 0.5694628 0.5716372 0.5868581 0.5890325 0.5912069
# this the the correlation value(s) that corresponds to a 5% chance of a false positive

Is this the right approach?


E.g.:

cor_real=rnorm(100,0.25,sd=0.1)
cor_null=rnorm(100,0.2,sd=0.1)

h_real=hist(cor_real,plot=F)
h_null=hist(cor_null,plot=F)

plot(h_null,col=rgb(1,0,0,.5),xlim=c(0,1),ylim=c(0,max(h_real$counts))) # in red
plot(h_real,col=rgb(0,.5,.5,0.25),add=T) # in blue

Real = blue, null = red

I think this is when the ratio of the frequencies of the two histograms = 0.05 (null:real), but I'm not 100% sure about that.

How might I find the correlation value that corresponds to FDR = 0.05, having "access" to both the null and real distributions?


Solution

  • Density is not quite correct because 1. you did not set n and from, to to be the same, 2. it calculates the number of false positive / number of false negative at only 1 bin.

    False discovery rate is defined as FP / (FP + TP). See this post too. We can calculate this once we put the two correlations in the same vector, label and order them:

    set.seed(321)
    cor_real=rnorm(1000,0,sd=0.2)
    cor_null=rnorm(1000,0,sd=0.15)
    
    df = data.frame(rho = c(cor_real,cor_null),
                    type = rep(c(TRUE,FALSE),each=1000))
    df$rho = abs(df$rho)
    df = df[order(df$rho,decreasing=TRUE),]
    
    df$FP = cumsum(df$type == FALSE)
    df$TP = cumsum(df$type == TRUE)
    
    df$FDR = df$FP / (df$FP + df$TP)
    

    If you look at the results,

    head(df,25)
               rho  type FP TP        FDR
    366  0.5822139  TRUE  0  1 0.00000000
    247  0.5632078  TRUE  0  2 0.00000000
    298  0.5594879  TRUE  0  3 0.00000000
    147  0.5460875  TRUE  0  4 0.00000000
    781  0.5373146  TRUE  0  5 0.00000000
    760  0.5367116  TRUE  0  6 0.00000000
    797  0.5216281  TRUE  0  7 0.00000000
    569  0.5204598  TRUE  0  8 0.00000000
    374  0.5200687  TRUE  0  9 0.00000000
    744  0.5101275  TRUE  0 10 0.00000000
    864  0.5058457  TRUE  0 11 0.00000000
    227  0.4997959  TRUE  0 12 0.00000000
    66   0.4993164  TRUE  0 13 0.00000000
    14   0.4886520  TRUE  0 14 0.00000000
    830  0.4840573  TRUE  0 15 0.00000000
    261  0.4765394  TRUE  0 16 0.00000000
    1163 0.4703764 FALSE  1 16 0.05882353
    27   0.4661862  TRUE  1 17 0.05555556
    965  0.4633883  TRUE  1 18 0.05263158
    530  0.4608271  TRUE  1 19 0.05000000
    96   0.4575683  TRUE  1 20 0.04761905
    851  0.4563224  TRUE  1 21 0.04545455
    922  0.4516161  TRUE  1 22 0.04347826
    343  0.4511517  TRUE  1 23 0.04166667
    

    At abs(rho) >= 0.4511517, you have 1 FP and 23 TP, giving you an FDR of 0.0416.. which is below the FDR of 0.05. So you can set your absolute cutoff here.

    The example you have is pretty hard to test because both are almost the same null hypothesis with only a different sd. In real life most likely we would need to simulate data to find the correlation we obtain under the null hypothesis. And you will see that the calculation above should work pretty well.