What is the difference between approach 1 and approach 2 below? I was thinking that 'I()' will allow us to multiple 2 variables and to not included interaction, but here it is not working as expected. Do I undertsand correct that the 2nd approach takes into account also three 0 (non-USA)? So the model is build on 6 points instead of 3 - can we somehow fix it?
df <- data.frame(
Salary=c(5, 1:2,4,1:2),
Variable1=c(500,490,501,460,490,505),
Variable2=c(5,10,0,3,17,40),
Country=c(rep("USA",3),rep("RPA",3)),
Dummy_USA=c(rep(1,3), rep(0,3))
)
# Approach 1
summary(lm(Salary~Variable1, df%>% filter(Country=="USA")))
# Approach 2
summary(lm(Salary~I(Variable1*Dummy_USA), df))
Yes, the second version simply regresses the vector c(5, 1, 2, 4, 1, 2)
on the vector c(500, 490, 501, 0, 0, 0)
. This is very different from the first version, which regresses the vector c(5, 1, 2)
in the vector c(500, 490, 501)
.
If you want to use a dummy variable you could either pass it to the subset
argument of lm
or the weights
argument.
with(df, summary(lm(Salary ~ Variable1, subset = Dummy_USA == 1)))
#>
#> Call:
#> lm(formula = Salary ~ Variable1, subset = Dummy_USA == 1)
#>
#> Residuals:
#> 1 2 3
#> 1.6847 -0.1532 -1.5315
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -104.7928 131.8453 -0.795 0.572
#> Variable1 0.2162 0.2653 0.815 0.565
#>
#> Residual standard error: 2.282 on 1 degrees of freedom
#> Multiple R-squared: 0.3992, Adjusted R-squared: -0.2017
#> F-statistic: 0.6644 on 1 and 1 DF, p-value: 0.5646
or
with(df, summary(lm(Salary ~ Variable1, weights = Dummy_USA)))
#>
#> Call:
#> lm(formula = Salary ~ Variable1, weights = Dummy_USA)
#>
#> Weighted Residuals:
#> 1 2 3 4 5 6
#> 1.6847 -0.1532 -1.5315 0.0000 0.0000 0.0000
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -104.7928 131.8453 -0.795 0.572
#> Variable1 0.2162 0.2653 0.815 0.565
#>
#> Residual standard error: 2.282 on 1 degrees of freedom
#> Multiple R-squared: 0.3992, Adjusted R-squared: -0.2017
#> F-statistic: 0.6644 on 1 and 1 DF, p-value: 0.5646
Created on 2023-03-20 with reprex v2.0.2