I am trying to write this formula in R where i = each value of the category (category can be 1 2 3 or 4)
This is my code attempt but R prints this error message:
Error in lm(category ~ (year * state * district) + year + state + district + :
formal argument "data" matched by multiple actual arguments
I am trying to create a summation so I had to add multiple arguments after the data, is there another way to write the summation to avoid the error message? I checked online but could not find anything similar, I am guessing it is rare to add a summation to a regression. Thank you in advance for any help
ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,
17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,
33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48)
year <- c(1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,
1981,1981,1981,1981,1981,1981,1981,1981,1981,1981,1981,1981,1981,1981,1981,1981,
1982,1982,1982,1982,1982,1982,1982,1982,1982,1982,1982,1982,1982,1982,1982,1982)
state <- c("NY","NY","NY","NY","NY","NY","NY","NY","CA","CA","CA","CA","CA","CA","CA","CA",
"NY","NY","NY","NY","NY","NY","NY","NY","CA","CA","CA","CA","CA","CA","CA","CA",
"NY","NY","NY","NY","NY","NY","NY","NY","CA","CA","CA","CA","CA","CA","CA","CA")
district <- c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2,
1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2,
1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)
quantity <- c(100,200,45,87,65,32,94,52,67,72,14,53,28,94,12,41,
10,20,45,87,65,32,8,52,67,1,14,53,28,94,12,41,
1000,2000,45,87,9,32,94,5,6,7,1,5,2,9,1,4)
category <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,
1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,
1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
df <- data.frame(ID,year,state,district,quantity,category)
df$year <- as.factor(df$year)
df$state <- as.factor(df$state)
df$district <- as.factor(df$district)
df$category <- as.factor(df$category)
print(df)
# force regression baseline values
relevel(df$year, ref = '1981')
relevel(df$district, ref = '2')
# r1 is when y = 1
r1 <- lm( category ~ (year*state*district) +
quantity + district + state + year,
data = subset(df, year == 1980)
+
(year*state*district) +
quantity + district + state + year,
data = subset(df, year == 1981)
+
(year*state*district) +
quantity + district + state + year,
data = subset(df, year == 1980)
)
summary(r1)
# r2 is when y = 2
r2 <- lm( category ~ (year*state*district) +
year + state + district + quantity,
data = subset(df, year == 1980)
+
(year*state*district) +
year + state + district + quantity,
data = subset(df, year == 1981)
+
(year*state*district) +
year + state + district + quantity,
data = subset(df, year == 1980)
)
summary(r2)
then r3 and r4
There are several problems here:
the line with the summation near the beginning needs to have coefficients multiplying each term
what does year[i, y] mean? It is not defined.
linear regression is not appropriate for a categorical response. Assuming that the categories are unordered we can use multinomial logistic regression.
interactions normally require that all lower order interactions be included as well.
Perhaps you want this:
library(nnet)
fm <- multinom(category ~ year/(district * state) + district + state + quantity, df)
summary(fm)
fm
is of class "multinom" with these methods:
methods(class = "multinom")
## [1] add1 anova coef confint drop1 extractAIC
## [7] logLik model.frame predict print summary vcov
## see '?methods' for accessing help and source code
For interpretation see https://stats.oarc.ucla.edu/r/dae/multinomial-logistic-regression/