In my field of study, it is well-established that there is a linear relationship between two variables -log10(x) and y.
I made the following scatterplot in R, with the code:
plot(-log10(LDR2EUR$V5),LDR2EUR$V6,ylab="r2 to rs13169313", xlab="log10(association p-value)",col=ifelse(LDR2EUR$V6==1,'purple',LDR2EUR$V7), pch=20)
and I then attempted to add a regression line via:
abline(lm(LDR2EUR$V6~-log10(LDR2EUR$V5)))
However, the line does not fit the data like a best fit line should.
I am wondering if the poor line fit may have to do with the -log10?
Since
cor(LDR2$V6,-log10(LDR2$V5))
returns 0.9776906 it seems to me that a horizontal line should not result, but rather a line similar to the line y=x.
Any guidance would be much appreciated.
It's a formula problem and nothing to do with log10 but rather because of how "-"
-signs are interpreted in formula expressions:
lm(LDR2EUR$V6 ~ -log10(LDR2EUR$V5)
.... does not regress V6 against negative V5 but rather removes V5 from consideration. (You ended up plotting a line at the mean of V6. Try instead:
abline( lm( LDR2EUR$V6 ~ I(-log10(LDR2EUR$V5) ) )
It's possible that you really wanted (if theory supports a line through (0,0):
abline( lm( LDR2EUR$V6 ~ I(-log10(LDR2EUR$V5) -1 ) )
Which does also not subtract 1 from anything, but rather removes the y-intercept and forces the fit to go through the origin.