Search code examples
rdataframescaleradixtibble

Why isn't base scale working with Tibble?


I have a dataset I've imported from excel using readxl called GSMA. Checking the class of the object returns:

    class(GSMA)
[1] "tbl_df"     "tbl"        "data.frame"

I want to standardise columns 2 through 4 using base scale. I try running:

GSMA[2:4] <- scale(GSMA[2:4])

This results in an incorrectly scaled dataframe, with each row having the same value for all columns.

A potential clue to the problem: When I attempt to sort the incorrectly scaled dataframe, this error is returned:

Error in xj[i, , drop = FALSE] : subscript out of bounds

When I re-import the same dataset, and then run:

GSMA <- as.data.frame(GSMA)
GSMA[2:4] <- scale(GSMA[2:4])

The dataframe columns scale correctly.

What is going on? Why is base scale not working in the first instance?

dput(head(GSMA))

structure(list(Country = c("GBR", "CHE", "DEU", "ROU", "LUX", 
"KAZ"), entry = c(98.4974384307861, 95.6549962361654, 91.4044539133708, 
90.8518393834432, 90.4088099797567, 88.0471547444662), medium = c(86.0081672668457, 
93.0372142791748, 91.2993144989014, 100, 96.7348480224609, 100
), high = c(74.6774760159579, 84.1793060302734, 79.542350769043, 
99.6931856328791, 97.031680020419, 92.5396745855158)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

weirdly, this is correct:

> scale(head(GSMA[2:4]))
          entry     medium       high
[1,]  1.5644225 -1.5528676 -1.3233285
[2,]  0.8257534 -0.2694974 -0.3755223
[3,] -0.2788406 -0.5868048 -0.8380579
[4,] -0.4224492  1.0017748  1.1719851
[5,] -0.5375798  0.4056202  0.9065003
[6,] -1.1513063  1.0017748  0.4584233
attr(,"scaled:center")
   entry   medium     high 
92.47745 94.51326 87.94395 
attr(,"scaled:scale")
    entry    medium      high 
 3.848059  5.477022 10.025077 

but this is not:

> GSMA[2:4] <- scale(GSMA[2:4])
> head(GSMA)
# A tibble: 6 x 4
  Country entry[,"entry"] [,"medium"] [,"high"] medium[,"entry"] [,"medium"] [,"high"]
  <chr>             <dbl>       <dbl>     <dbl>            <dbl>       <dbl>     <dbl>
1 GBR                2.13        1.25     0.870             2.13        1.25     0.870
2 CHE                2.00        1.52     1.27              2.00        1.52     1.27 
3 DEU                1.80        1.46     1.07              1.80        1.46     1.07 
4 ROU                1.78        1.80     1.92              1.78        1.80     1.92 
5 LUX                1.76        1.67     1.81              1.76        1.67     1.81 
6 KAZ                1.65        1.80     1.62              1.65        1.80     1.62 
# ... with 3 more variables: high[,"entry"] <dbl>, [,"medium"] <dbl>, [,"high"] <dbl>

Solution

  • Known issue with Tibble 3.0.0. Revert to 2.1.3 for old behavior.

    Or:

    library(tibble)
    iris <- as_tibble(iris)
    scale <- scale(iris[1:3])
    class(scale)
    #> [1] "matrix"
    iris[1:3] <- as.data.frame(scale)