I am trying to write this formula in R where i = each value of the category (category can be 1 2 3 or 4)
This is my code attempt but R prints this error message:
Error in lm(category ~ (year * state * district) + year + state + district + :
formal argument "data" matched by multiple actual arguments
I am trying to create a summation so I had to add multiple arguments after the data, is there another way to write the summation to avoid the error message? I checked online but could not find anything similar, I am guessing it is rare to add a summation to a regression. Thank you in advance for any help
ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,
year <- c(1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,1980,
state <- c("NY","NY","NY","NY","NY","NY","NY","NY","CA","CA","CA","CA","CA","CA","CA","CA",
district <- c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2,
quantity <- c(100,200,45,87,65,32,94,52,67,72,14,53,28,94,12,41,
category <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,
df <- data.frame(ID,year,state,district,quantity,category)
df$year <- as.factor(df$year)
df$state <- as.factor(df$state)
df$district <- as.factor(df$district)
df$category <- as.factor(df$category)
# force regression baseline values
relevel(df$year, ref = '1981')
relevel(df$district, ref = '2')
# r1 is when y = 1
r1 <- lm( category ~ (year*state*district) +
quantity + district + state + year,
data = subset(df, year == 1980)
(year*state*district) +
quantity + district + state + year,
data = subset(df, year == 1981)
(year*state*district) +
quantity + district + state + year,
data = subset(df, year == 1980)
# r2 is when y = 2
r2 <- lm( category ~ (year*state*district) +
year + state + district + quantity,
data = subset(df, year == 1980)
(year*state*district) +
year + state + district + quantity,
data = subset(df, year == 1981)
(year*state*district) +
year + state + district + quantity,
data = subset(df, year == 1980)
then r3 and r4
There are several problems here:
the line with the summation near the beginning needs to have coefficients multiplying each term
what does year[i, y] mean? It is not defined.
linear regression is not appropriate for a categorical response. Assuming that the categories are unordered we can use multinomial logistic regression.
interactions normally require that all lower order interactions be included as well.
Perhaps you want this:
fm <- multinom(category ~ year/(district * state) + district + state + quantity, df)
is of class "multinom" with these methods:
methods(class = "multinom")
## [1] add1 anova coef confint drop1 extractAIC
## [7] logLik model.frame predict print summary vcov
## see '?methods' for accessing help and source code
For interpretation see https://stats.oarc.ucla.edu/r/dae/multinomial-logistic-regression/