I'm running an MLR in R examining the effects of 4 explanatory variables (Temperature, Dissolved Oxygen, Practical Salinity, and Oxidative Reductive Potential) on 1 response variable (Shell Roundness):
shell_round_mlr <- lm(Shell_Round ~ TempC + O2 + PSU + ORP, data = morph.na)
The dataset (morph.na) in question has 53 rows of data. When I run the following code to examine the model...
par(mfrow = c(2,2))
plot(shell_round_mlr)
I get these plots:
[Residuals vs. Fitted Value, Normal Q-Q, Scale-Location, Residuals vs. Leverage] [1]: https://i.sstatic.net/Lkkmd.png
Which show observations #65 and #159 as ones I would possibly like to remove. However, how is it possible that I have an observation #159, when I only have 53 rows of data? I have triple-checked that I am calling the correct dataframe.
Also, in this case, if I would want to remove any of these troublesome observations, how would I go about doing that? It is not as simple as removing a row from the dataframe.
Any advice would be appreciated. Thank you.
It's difficult to diagnose your problem without a reproducible example. But as @aosmith noted in a comment, plot
will use row indices for labeling. This example shows lm
plots with labelled values above the total sample size.
set.seed(1L)
df <- data.frame(x = rnorm(20), y = rnorm(20))
rownames(df) <- sample(50:70, 20)
fit <- lm(y ~ x, data = df)
plot(fit)
By comparison, here's the same plot with labels.id = NULL
:
plot(fit, labels.id = NULL)