Search code examples
rlinear-regressionlm

order/number of variables in lm causing singularities?


I was trying to run a linear model using lm() in R with 12 explanatory variables and 33 observations), but the coefficients for the last three variables are not defined because of singularities. When I switched the order of the variables, the same thing happens again, even though those variables (TotalPrec_11, TotalPrec_12, TotalPrec_10) were significant before. The coefficients were also different between two models.

ab <- lm(value ~ TotalPrec_12 + TotalPrec_11 + TotalPrec_10 + TotalPrec_9 + TotalPrec_8 + TotalPrec_7 + TotalPrec_6 + TotalPrec_5 + TotalPrec_4 + TotalPrec_3 + TotalPrec_2 + TotalPrec_1, data = aa)

summary(ab)

#Coefficients: (3 not defined because of singularities)
#              Estimate Std. Error t value Pr(>|t|)  
#(Intercept)      64.34      30.80   2.089   0.0480 *
#TotalPrec_12  19811.97   11080.14   1.788   0.0869 .
#TotalPrec_11 -16159.45    7099.89  -2.276   0.0325 *
#TotalPrec_10 -16500.62   18813.96  -0.877   0.3895  
#TotalPrec_9   62662.08   51143.37   1.225   0.2329  
#TotalPrec_8     665.39   36411.95   0.018   0.9856  
#TotalPrec_7  -77203.59   51555.71  -1.497   0.1479  
#TotalPrec_6    4830.11   19503.52   0.248   0.8066  
#TotalPrec_5    6403.94   14902.77   0.430   0.6714  
#TotalPrec_4    -735.73    5023.83  -0.146   0.8848  
#TotalPrec_3         NA         NA      NA       NA  
#TotalPrec_2         NA         NA      NA       NA  
#TotalPrec_1         NA         NA      NA       NA  

The same data here with a different order of variables:

ab1 <- lm(value ~ TotalPrec_1 + TotalPrec_2 + TotalPrec_3 + TotalPrec_9 + TotalPrec_8 + TotalPrec_7 + TotalPrec_6 + TotalPrec_5 + TotalPrec_4 + TotalPrec_11 + TotalPrec_12 + TotalPrec_10, data = aa)

summary(ab1)

#Coefficients: (3 not defined because of singularities)
#              Estimate Std. Error t value Pr(>|t|)  
#(Intercept)      63.72      54.44   1.171   0.2538  
#TotalPrec_1   19611.54   19366.33   1.013   0.3218  
#TotalPrec_2  -14791.44    7847.87  -1.885   0.0722 .
#TotalPrec_3    6766.60    3144.68   2.152   0.0422 *
#TotalPrec_9   28677.62   53530.82   0.536   0.5973  
#TotalPrec_8  -23207.34   65965.12  -0.352   0.7282  
#TotalPrec_7  -26628.10   55839.25  -0.477   0.6380  
#TotalPrec_6  -28694.23   13796.80  -2.080   0.0489 *
#TotalPrec_5   46982.35   17941.89   2.619   0.0154 *
#TotalPrec_4  -26393.70   17656.70  -1.495   0.1486  
#TotalPrec_11        NA         NA      NA       NA  
#TotalPrec_12        NA         NA      NA       NA  
#TotalPrec_10        NA         NA      NA       NA  

Several posts online suggest that it might be a multicollinearity problems. I ran the cor() function to check for collinearity, and nothing came out to be perfectly correlated.

I used the same set of these 12 variables with other response variables, and there was no problem with singularities. So I'm not sure what happens here and what I need to do differently to figure this out.

edit

here is my data

> dput(aa)
structure(list(value = c(93, 95, 88, 90, 90, 80, 100, 80, 96, 
100, 100, 100, 80, 94, 88, 76, 90, 0, 93, 100, 88, 90, 95, 71, 
92, 93, 92, 100, 85, 90, 100, 100, 100), TotalPrec_1 = c(0.00319885835051536, 
0.00319885835051536, 0.00319885835051536, 0.00717973057180643, 
0.00717973057180643, 0.00717973057180643, 0.00464357063174247, 
0.00464357063174247, 0.00464357063174247, 0.00598198547959327, 
0.00598198547959327, 0.00598198547959327, 0.00380058260634541, 
0.00380058260634541, 0.00380058260634541, 0.00380058260634541, 
0.00364887388423085, 0.00364887388423085, 0.00364887388423085, 
0.00475014140829443, 0.00475014140829443, 0.00475014140829443, 
0.00475014140829443, 0.00499139120802283, 0.00499139120802283, 
0.00499139120802283, 0.00499139120802283, 0.00490436097607016, 
0.00490436097607016, 0.00490436097607016, 0.00623255362734198, 
0.00623255362734198, 0.00623255362734198), TotalPrec_2 = c(0.00387580785900354, 
0.00387580785900354, 0.00387580785900354, 0.00625309534370899, 
0.00625309534370899, 0.00625309534370899, 0.00298969540745019, 
0.00298969540745019, 0.00298969540745019, 0.00558579061180353, 
0.00558579061180353, 0.00558579061180353, 0.00370361795648932, 
0.00370361795648932, 0.00370361795648932, 0.00370361795648932, 
0.00335893919691443, 0.00335893919691443, 0.00335893919691443, 
0.00621500937268137, 0.00621500937268137, 0.00621500937268137, 
0.00621500937268137, 0.00234323320910334, 0.00234323320910334, 
0.00234323320910334, 0.00234323320910334, 0.00644989637658, 0.00644989637658, 
0.00644989637658, 0.00476496992632746, 0.00476496992632746, 0.00476496992632746
), TotalPrec_3 = c(0.00418250449001789, 0.00418250449001789, 
0.00418250449001789, 0.00702223135158419, 0.00702223135158419, 
0.00702223135158419, 0.00427648611366748, 0.00427648611366748, 
0.00427648611366748, 0.00562589056789875, 0.00562589056789875, 
0.00562589056789875, 0.0037367227487266, 0.0037367227487266, 
0.0037367227487266, 0.0037367227487266, 0.00477339653298258, 
0.00477339653298258, 0.00477339653298258, 0.0124167986214161, 
0.0124167986214161, 0.0124167986214161, 0.0124167986214161, 0.010678518563509, 
0.010678518563509, 0.010678518563509, 0.010678518563509, 0.0139585845172405, 
0.0139585845172405, 0.0139585845172405, 0.00741709442809224, 
0.00741709442809224, 0.00741709442809224), TotalPrec_4 = c(0.00659881485626101, 
0.00659881485626101, 0.00659881485626101, 0.00347008113749325, 
0.00347008113749325, 0.00347008113749325, 0.00720167113468051, 
0.00720167113468051, 0.00720167113468051, 0.00704727275297045, 
0.00704727275297045, 0.00704727275297045, 0.00856677815318107, 
0.00856677815318107, 0.00856677815318107, 0.00856677815318107, 
0.00867980346083641, 0.00867980346083641, 0.00867980346083641, 
0.00614343490451574, 0.00614343490451574, 0.00614343490451574, 
0.00614343490451574, 0.00704662408679723, 0.00704662408679723, 
0.00704662408679723, 0.00704662408679723, 0.00495137926191091, 
0.00495137926191091, 0.00495137926191091, 0.00796654727309942, 
0.00796654727309942, 0.00796654727309942), TotalPrec_5 = c(0.00515584181994199, 
0.00515584181994199, 0.00515584181994199, 0.000977653078734875, 
0.000977653078734875, 0.000977653078734875, 0.00485571753233671, 
0.00485571753233671, 0.00485571753233671, 0.00477610062807798, 
0.00477610062807798, 0.00477610062807798, 0.00664602871984243, 
0.00664602871984243, 0.00664602871984243, 0.00664602871984243, 
0.00533714797347784, 0.00533714797347784, 0.00533714797347784, 
0.00265633105300366, 0.00265633105300366, 0.00265633105300366, 
0.00265633105300366, 0.00200922577641904, 0.00200922577641904, 
0.00200922577641904, 0.00200922577641904, 0.00172789173666387, 
0.00172789173666387, 0.00172789173666387, 0.00347296684049069, 
0.00347296684049069, 0.00347296684049069), TotalPrec_6 = c(0.00170362275093793, 
0.00170362275093793, 0.00170362275093793, 0.000670029199682176, 
0.000670029199682176, 0.000670029199682176, 0.0018315939232707, 
0.0018315939232707, 0.0018315939232707, 0.00138648133724927, 
0.00138648133724927, 0.00138648133724927, 0.00329410820268094, 
0.00329410820268094, 0.00329410820268094, 0.00329410820268094, 
0.00210500298999249, 0.00210500298999249, 0.00210500298999249, 
0.000628655252512544, 0.000628655252512544, 0.000628655252512544, 
0.000628655252512544, 0.000631613133009523, 0.000631613133009523, 
0.000631613133009523, 0.000631613133009523, 0.000616533157881349, 
0.000616533157881349, 0.000616533157881349, 0.000599739549215883, 
0.000599739549215883, 0.000599739549215883), TotalPrec_7 = c(0.00124496815260499, 
0.00124496815260499, 0.00124496815260499, 0.000289129035081714, 
0.000289129035081714, 0.000289129035081714, 0.00089572963770479, 
0.00089572963770479, 0.00089572963770479, 0.00187503395136445, 
0.00187503395136445, 0.00187503395136445, 0.00070394336944446, 
0.00070394336944446, 0.00070394336944446, 0.00070394336944446, 
0.000733022985514253, 0.000733022985514253, 0.000733022985514253, 
4.50894685855019e-06, 4.50894685855019e-06, 4.50894685855019e-06, 
4.50894685855019e-06, 3.02730550174601e-05, 3.02730550174601e-05, 
3.02730550174601e-05, 3.02730550174601e-05, 3.71173496205301e-06, 
3.71173496205301e-06, 3.71173496205301e-06, 4.58224167232402e-05, 
4.58224167232402e-05, 4.58224167232402e-05), TotalPrec_8 = c(0.000394100265111774, 
0.000394100265111774, 0.000394100265111774, 0.000930351321585476, 
0.000930351321585476, 0.000930351321585476, 0.000679628865327686, 
0.000679628865327686, 0.000679628865327686, 0.000997507828287781, 
0.000997507828287781, 0.000997507828287781, 1.77486290340312e-05, 
1.77486290340312e-05, 1.77486290340312e-05, 1.77486290340312e-05, 
1.63553704624064e-05, 1.63553704624064e-05, 1.63553704624064e-05, 
4.31556363764685e-05, 4.31556363764685e-05, 4.31556363764685e-05, 
4.31556363764685e-05, 8.14739760244265e-05, 8.14739760244265e-05, 
8.14739760244265e-05, 8.14739760244265e-05, 4.07490988436621e-05, 
4.07490988436621e-05, 4.07490988436621e-05, 0.000140139847644605, 
0.000140139847644605, 0.000140139847644605), TotalPrec_9 = c(0.000616681878454983, 
0.000616681878454983, 0.000616681878454983, 0.000742240983527154, 
0.000742240983527154, 0.000742240983527154, 0.000230846126214601, 
0.000230846126214601, 0.000230846126214601, 0.00132466584909707, 
0.00132466584909707, 0.00132466584909707, 0.000114383190521039, 
0.000114383190521039, 0.000114383190521039, 0.000114383190521039, 
6.07241054240149e-05, 6.07241054240149e-05, 6.07241054240149e-05, 
2.74324702331796e-05, 2.74324702331796e-05, 2.74324702331796e-05, 
2.74324702331796e-05, 6.96572624292457e-06, 6.96572624292457e-06, 
6.96572624292457e-06, 6.96572624292457e-06, 3.32364725181833e-05, 
3.32364725181833e-05, 3.32364725181833e-05, 0.000108777909190394, 
0.000108777909190394, 0.000108777909190394), TotalPrec_10 = c(0.00040393992094323, 
0.00040393992094323, 0.00040393992094323, 0.00166831514798104, 
0.00166831514798104, 0.00166831514798104, 0.000324568885844201, 
0.000324568885844201, 0.000324568885844201, 0.000868275004904717, 
0.000868275004904717, 0.000868275004904717, 1.25834640130051e-05, 
1.25834640130051e-05, 1.25834640130051e-05, 1.25834640130051e-05, 
7.2861012085923e-06, 7.2861012085923e-06, 7.2861012085923e-06, 
0.000946254527661949, 0.000946254527661949, 0.000946254527661949, 
0.000946254527661949, 0.000476793473353609, 0.000476793473353609, 
0.000476793473353609, 0.000476793473353609, 0.00102826312649995, 
0.00102826312649995, 0.00102826312649995, 0.00266417209059, 0.00266417209059, 
0.00266417209059), TotalPrec_11 = c(0.00124716362915933, 0.00124716362915933, 
0.00124716362915933, 0.00470362277701497, 0.00470362277701497, 
0.00470362277701497, 0.0017967780586332, 0.0017967780586332, 
0.0017967780586332, 0.000694554066285491, 0.000694554066285491, 
0.000694554066285491, 0.000485763972392306, 0.000485763972392306, 
0.000485763972392306, 0.000485763972392306, 0.00074231723556295, 
0.00074231723556295, 0.00074231723556295, 0.000763822405133396, 
0.000763822405133396, 0.000763822405133396, 0.000763822405133396, 
0.00114128366112709, 0.00114128366112709, 0.00114128366112709, 
0.00114128366112709, 0.000856105296406895, 0.000856105296406895, 
0.000856105296406895, 0.00255026295781135, 0.00255026295781135, 
0.00255026295781135), TotalPrec_12 = c(0.00380058260634541, 0.00380058260634541, 
0.00380058260634541, 0.00475014140829443, 0.00475014140829443, 
0.00475014140829443, 0.00412079365924, 0.00412079365924, 0.00412079365924, 
0.00455283792689442, 0.00455283792689442, 0.00455283792689442, 
0.00117174908518791, 0.00117174908518791, 0.00117174908518791, 
0.00117174908518791, 0.00119069591164588, 0.00119069591164588, 
0.00119069591164588, 0.00201585865579545, 0.00201585865579545, 
0.00201585865579545, 0.00201585865579545, 0.00202310062013566, 
0.00202310062013566, 0.00202310062013566, 0.00202310062013566, 
0.00231692171655595, 0.00231692171655595, 0.00231692171655595, 
0.00495567917823791, 0.00495567917823791, 0.00495567917823791
)), row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"
))

Solution

  • When you have multiple predictors, singularity doesn’t necessarily mean that two variables are perfectly correlated. It means that at least one of your variables can be perfectly predicted by some combination of the other variables, even if none of those variables is a perfect predictor on its own. When you have many predictors relative to few observations, as you do, the odds of this happening increase. So you will probably need to simplify your model.