Search code examples
rlinear-regressionmlr

Why do I have more observations in my Multiple Linear Regression than I do rows in my dataframe in R?


I'm running an MLR in R examining the effects of 4 explanatory variables (Temperature, Dissolved Oxygen, Practical Salinity, and Oxidative Reductive Potential) on 1 response variable (Shell Roundness):

shell_round_mlr <- lm(Shell_Round ~ TempC + O2 + PSU + ORP, data = morph.na)

The dataset (morph.na) in question has 53 rows of data. When I run the following code to examine the model...

par(mfrow = c(2,2))
plot(shell_round_mlr)

I get these plots:

[Residuals vs. Fitted Value, Normal Q-Q, Scale-Location, Residuals vs. Leverage] [1]: https://i.sstatic.net/Lkkmd.png

Which show observations #65 and #159 as ones I would possibly like to remove. However, how is it possible that I have an observation #159, when I only have 53 rows of data? I have triple-checked that I am calling the correct dataframe.

Also, in this case, if I would want to remove any of these troublesome observations, how would I go about doing that? It is not as simple as removing a row from the dataframe.

Any advice would be appreciated. Thank you.


Solution

  • It's difficult to diagnose your problem without a reproducible example. But as @aosmith noted in a comment, plot will use row indices for labeling. This example shows lm plots with labelled values above the total sample size.

    set.seed(1L)
    df <- data.frame(x = rnorm(20), y = rnorm(20))
    rownames(df) <- sample(50:70, 20)
    
    fit <- lm(y ~ x, data = df)
    
    plot(fit)
    

    enter image description here

    By comparison, here's the same plot with labels.id = NULL:

    plot(fit, labels.id = NULL)
    

    enter image description here