Search code examples
rlinear-regression

In summary.lm(P.for.trend) : essentially perfect fit: summary may be unreliable; How to deal with this?


I used the following data and code to assess trend of misconduct over years but I got weird results using linear regression model as you can see below. I saw a prior answers but I could not understand my problem yet. Should I use non-linear regression instead? If so which regression type would be recommended?

Any input will be greatly appreciated.

dataYear.Pub.MISCONDUCT<-read.table(text= "Year Yes
1965 100.00000   0.00000
1971 100.00000   0.00000
1973 100.00000   0.00000
1974   0.00000 100.00000
1975   0.00000 100.00000
1976   0.00000 100.00000
1977 100.00000   0.00000
1978 100.00000   0.00000
1979  66.66667  33.33333
1980  60.00000  40.00000
1981  70.00000  30.00000
1982  75.00000  25.00000
1983  54.54545  45.45455
1984  50.00000  50.00000
1985  20.00000  80.00000
1986  87.50000  12.50000
1987 100.00000   0.00000
1988  57.14286  42.85714
1989  60.00000  40.00000
1990  61.29032  38.70968
1991  65.00000  35.00000
1992  71.42857  28.57143
1993  43.75000  56.25000
1994  33.33333  66.66667
1995  43.75000  56.25000
1996  40.00000  60.00000
1997  41.46341  58.53659
1998  28.35821  71.64179
1999  17.24138  82.75862
2000  15.62500  84.37500
2001  38.37209  61.62791
2002  36.14458  63.85542
2003  37.14286  62.85714
2004  27.65957  72.34043
2005  32.93413  67.06587
2006  30.58252  69.41748
2007  28.20513  71.79487
2008  32.94574  67.05426
2009  31.06061  68.93939
2010  32.20339  67.79661
2011  33.11475  66.88525
2012  35.95166  64.04834
2013  31.17647  68.82353
2014  25.00000  75.00000
2015  32.27384  67.72616
2016  49.49833  50.50167
2017  55.37849  44.62151
2018  59.67742  40.32258
2019  65.17413  34.82587
2020  65.38462  34.61538 ", sep="", header=T);dataYear.Pub.MISCONDUCT

P.for.trend<-lm(dataYear.Pub.MISCONDUCT$Year~dataYear.Pub.MISCONDUCT$Yes);
summary (P.for.trend)

Results:

> Call:
lm(formula = dataYear.Pub.MISCONDUCT$Year ~ dataYear.Pub.MISCONDUCT$Yes)

Residuals:
       Min         1Q     Median         3Q        Max 
-1.946e-14 -5.051e-15 -2.349e-15  1.044e-15  1.459e-13 

Coefficients:
                              Estimate Std. Error    t value Pr(>|t|)    
(Intercept)                  1.000e+02  6.834e-15  1.463e+16   <2e-16 ***
dataYear.Pub.MISCONDUCT$Yes -1.000e+00  1.184e-16 -8.449e+15   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.241e-14 on 48 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 7.139e+31 on 1 and 48 DF,  p-value: < 2.2e-16

Warning message: In summary.lm(P.for.trend) : essentially perfect fit: summary may be unreliable


Solution

  • Lots of typos here but try assuming you want to predict the percent yes based on year.

    P.for.trend <- lm(Yes ~ Year, data = dataYear.Pub.MISCONDUCT)
    summary(P.for.trend)
    #> 
    #> Call:
    #> lm(formula = Yes ~ Year, data = dataYear.Pub.MISCONDUCT)
    #> 
    #> Residuals:
    #>     Min      1Q  Median      3Q     Max 
    #> -63.029  -9.305  -5.332  16.556  45.607 
    #> 
    #> Coefficients:
    #>              Estimate Std. Error t value Pr(>|t|)   
    #> (Intercept) 1374.4055   488.8403   2.812  0.00712 **
    #> Year          -0.6643     0.2450  -2.712  0.00926 **
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> Residual standard error: 25.45 on 48 degrees of freedom
    #> Multiple R-squared:  0.1328, Adjusted R-squared:  0.1148 
    #> F-statistic: 7.353 on 1 and 48 DF,  p-value: 0.009261
    

    Your data

    
    dataYear.Pub.MISCONDUCT <- 
    readr::read_table2("Year Yes No
    1965 100.00000   0.00000
    1971 100.00000   0.00000
    1973 100.00000   0.00000
    1974   0.00000 100.00000
    1975   0.00000 100.00000
    1976   0.00000 100.00000
    1977 100.00000   0.00000
    1978 100.00000   0.00000
    1979  66.66667  33.33333
    1980  60.00000  40.00000
    1981  70.00000  30.00000
    1982  75.00000  25.00000
    1983  54.54545  45.45455
    1984  50.00000  50.00000
    1985  20.00000  80.00000
    1986  87.50000  12.50000
    1987 100.00000   0.00000
    1988  57.14286  42.85714
    1989  60.00000  40.00000
    1990  61.29032  38.70968
    1991  65.00000  35.00000
    1992  71.42857  28.57143
    1993  43.75000  56.25000
    1994  33.33333  66.66667
    1995  43.75000  56.25000
    1996  40.00000  60.00000
    1997  41.46341  58.53659
    1998  28.35821  71.64179
    1999  17.24138  82.75862
    2000  15.62500  84.37500
    2001  38.37209  61.62791
    2002  36.14458  63.85542
    2003  37.14286  62.85714
    2004  27.65957  72.34043
    2005  32.93413  67.06587
    2006  30.58252  69.41748
    2007  28.20513  71.79487
    2008  32.94574  67.05426
    2009  31.06061  68.93939
    2010  32.20339  67.79661
    2011  33.11475  66.88525
    2012  35.95166  64.04834
    2013  31.17647  68.82353
    2014  25.00000  75.00000
    2015  32.27384  67.72616
    2016  49.49833  50.50167
    2017  55.37849  44.62151
    2018  59.67742  40.32258
    2019  65.17413  34.82587
    2020  65.38462  34.61538")