I have datasets from https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data There are two factors with 2 or more levels, plus one target value, SalePrice.
Street Alley SalePrice
Grvl: 6 Grvl: 50 Min. : 34900
Pave:1454 Pave: 41 1st Qu.:129975
NA's:1369 Median :163000
Mean :180921
3rd Qu.:214000
Max. :755000
When running linear regression separately on the two factors, it runs fine.
> summary(lm(SalePrice ~ Street, data=train))
Call:
lm(formula = SalePrice ~ Street, data = train)
Residuals:
Min 1Q Median 3Q Max
-146231 -51131 -18131 32869 573869
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 130190 32416 4.016 6.21e-05 ***
StreetPave 50940 32483 1.568 0.117
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 79400 on 1458 degrees of freedom
Multiple R-squared: 0.001684, Adjusted R-squared: 0.0009992
F-statistic: 2.459 on 1 and 1458 DF, p-value: 0.117
> summary(lm(SalePrice ~ Alley, data=train))
Call:
lm(formula = SalePrice ~ Alley, data = train)
Residuals:
Min 1Q Median 3Q Max
-128001 -17001 1781 16999 133781
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 122219 5153 23.718 < 2e-16 ***
AlleyPave 45782 7677 5.963 4.9e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 36440 on 89 degrees of freedom
(1369 observations deleted due to missingness)
Multiple R-squared: 0.2855, Adjusted R-squared: 0.2775
F-statistic: 35.56 on 1 and 89 DF, p-value: 4.9e-08
However, when running together, it results in error, which doesn't make sense.
> summary(lm(SalePrice ~ Street+Alley, data=train))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Can someone help on this?
I got a hint from this line in the question: (1369 observations deleted due to missingness)
In lm, missing values are simply deleted. While running lm on Street and Alley, NA's were deleted due to Alley, resulting in single value for Street factor.
> train[!is.na(Alley), Street]
[1] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[16] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[31] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[46] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[61] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[76] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[91] Pave
Levels: Grvl Pave