I've been trying to fit my data to a logistic curve. I have come across 2 different packages that I have been working with. DRM
and NLS
, so starting with drm
I am able to fit a model which is visualized as below.
Now my issue with this is the model summary.
Text
:
Formula: percent_farm_tractor ~ SSlogis(year, Asym, xmid, scal)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Asym 2.265 2.527 0.896 0.4207
xmid 1975.306 17.589 112.305 3.77e-08 ***
scal 9.575 2.674 3.580 0.0232 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03798 on 4 degrees of freedom
Number of iterations to convergence: 0
Achieved convergence tolerance: 5.703e-06
Unless I am interpreting this incorrectly, the b
coefficient represents the slope. How can this be negative when the model fit plot even shows a positive trend as year increases??
The nls model does return a positive number for the slope
but I'm having trouble getting the nls package to work with the rest of my data.
Thanks
EDIT:
I used split
to obtain multiple data.frames which each contain multiple records relating to a county and each record is a different year. So for example here a 2 different data.frame representing 2 different counties:
`Sample County 1
year state county name percent_farm_tractor stateAbb fips colorID
2 1925 1 3 BALDWIN 0.06760000 AL 1003 1
69 1930 1 3 BALDWIN 0.08679707 AL 1003 2
136 1940 1 3 BALDWIN 0.19938885 AL 1003 3
203 1950 1 3 BALDWIN 0.44627821 AL 1003 7
270 1954 1 3 BALDWIN 0.56669298 AL 1003 9
337 1964 1 3 BALDWIN 0.75094340 AL 1003 12
404 1969 1 3 BALDWIN 0.89988623 AL 1003 14
Sample County 2
:
year state county name percent_farm_tractor stateAbb fips colorID
476 1925 5 13 CALHOUN 0.000000000 AR 5013 1
551 1930 5 13 CALHOUN 0.006680027 AR 5013 1
626 1940 5 13 CALHOUN 0.027145359 AR 5013 1
701 1950 5 13 CALHOUN 0.187435633 AR 5013 3
776 1954 5 13 CALHOUN 0.333333333 AR 5013 5
851 1964 5 13 CALHOUN 0.530150754 AR 5013 8
926 1969 5 13 CALHOUN 0.929824561 AR 5013 14
I am applying drm
to each of these data.frames as demostrated below:
j <- 1
params <- data.frame()
for(j in 1:length(split_df)){
if(nrow(split_df[[j]]) != 1){
mL <- drm(percent_farm_tractor ~ year, data =
as.data.frame(split_df[[j]]), fct = L.3(), type = "continuous")
params <- rbind(params, coef(mL))
}
}
Basically every county has a negative slope value, which seems very counter intuitive as basically every county also has a positive trend as year increases.
tl;dr Just flip the sign; it's an unusual parameterization where negative b
corresponds to an increasing function. (I don't know why I never noticed this before; maybe because I've generally focused on the intercept/ED50 parameter ...)
This is the expression given for the generalized logistic in ?drc::logistic
:
f(x) = c + \frac{d-c}{(1+\exp(b(x - e)))^f}
For the three-parameter logistic, c=0, f=1, so we have
f(x) = \frac{d}{(1+\exp(b(x - e)))}
It's easy to see that d
is the upper asymptote and the e
is the half-maximum (when x=e
, the equation reduces to \frac{d}{(1+\exp(0)} = d/2
). b
is indeed a slope, but the key is that for x>e
the denominator is an increasing function of b
; that means that the overall expression is an decreasing function of b
.
This is in contrast to the more standard parameterization of the logistic which adds a negative sign before the slope, e.g. from plogis
:
F(x) = 1 / (1 + exp(-(x-m)/s))
Note the -
before (x-m)/s
! Here s
is a scale, m
is the half-max, 1/s
is the slope ...