Search code examples
rplotlm

Adding regression line via abline(lm(y~x)) in R produces odd result with -log10


In my field of study, it is well-established that there is a linear relationship between two variables -log10(x) and y.

I made the following scatterplot in R, with the code:

plot(-log10(LDR2EUR$V5),LDR2EUR$V6,ylab="r2 to rs13169313", xlab="log10(association p-value)",col=ifelse(LDR2EUR$V6==1,'purple',LDR2EUR$V7), pch=20)

and I then attempted to add a regression line via:

abline(lm(LDR2EUR$V6~-log10(LDR2EUR$V5)))

enter image description here

However, the line does not fit the data like a best fit line should.

I am wondering if the poor line fit may have to do with the -log10?

Since

cor(LDR2$V6,-log10(LDR2$V5))

returns 0.9776906 it seems to me that a horizontal line should not result, but rather a line similar to the line y=x.

Any guidance would be much appreciated.


Solution

  • It's a formula problem and nothing to do with log10 but rather because of how "-"-signs are interpreted in formula expressions:

    lm(LDR2EUR$V6 ~ -log10(LDR2EUR$V5) 
    

    .... does not regress V6 against negative V5 but rather removes V5 from consideration. (You ended up plotting a line at the mean of V6. Try instead:

    abline( lm( LDR2EUR$V6 ~ I(-log10(LDR2EUR$V5) ) )
    

    It's possible that you really wanted (if theory supports a line through (0,0):

    abline( lm( LDR2EUR$V6 ~ I(-log10(LDR2EUR$V5) -1 ) )
    

    Which does also not subtract 1 from anything, but rather removes the y-intercept and forces the fit to go through the origin.