Search code examples
runiqueanova

Why doesn't R unique() work with fitted values extracted from lm() objects?


Unique finds unique values of a vector.

If I have a data frame:

test_data <- data.frame(x = c(rep(1.00050239485720394857,4),
                              1.00050239485720394854,rep(2.0002230948570293845,5),rep(3.0005903847502398475,5)),
                        y = c(rep(4.00423409872345,5),rep(2.034532039485722,5),rep(1.1234152304957,5)))
sapply(test_data,unique)

R returns:

            x        y
[1,] 1.000502 4.004234
[2,] 2.000223 2.034532
[3,] 3.000590 1.123415

As expected.

But say I fit an lm() or aov() object and then try to find unique fitted values():

set.seed(123)

y = rf(100,50,3,3)
x1 <- factor(c(rep("blue",25),
               rep("green",25),
               rep("orange",25),
               rep("purple",25)))

bsFit <- aov(y ~ x1)
unique(bsFit$fitted.values) 

R returns:

 [1] 2.709076 2.709076 2.709076 2.709076 2.709076 2.709076
 [7] 2.709076 4.060080 4.060080 4.060080 4.060080 3.314801
[13] 3.314801 3.314801 3.314801 1.960280 1.960280 1.960280
[19] 1.960280 1.960280

There are clearly duplicates here.


Solution

  • As others have said (@Tim-Biegeleisen especially), RStudio is formatting the output to a specific number of decimal places (remember anything printed to the console is formatted by RStudio). So the "duplicates", if correctly formatted to show all decimal places, aren't duplicates.

    We can use format to show all decimal places:

    format(unique(bsFit$fitted.values), digit = 22)
     [1] "2.7090760788376542" "2.7090760788376773" "2.7090760788376604" "2.7090760788376622" "2.7090760788376627"
     [6] "2.7090760788376649" "2.7090760788376640" "4.0600797479202155" "4.0600797479202164" "4.0600797479202200"
    [11] "4.0600797479202146" "3.3148005388803132" "3.3148005388803128" "3.3148005388803146" "3.3148005388803137"
    [16] "1.9602804435309986" "1.9602804435309984" "1.9602804435309982" "1.9602804435309988" "1.9602804435310004
    

    I experimented to with the number of digits before an error was thrown and got 22.