I have a dataset I've imported from excel using readxl
called GSMA. Checking the class of the object returns:
class(GSMA)
[1] "tbl_df" "tbl" "data.frame"
I want to standardise columns 2 through 4 using base scale. I try running:
GSMA[2:4] <- scale(GSMA[2:4])
This results in an incorrectly scaled dataframe, with each row having the same value for all columns.
A potential clue to the problem: When I attempt to sort the incorrectly scaled dataframe, this error is returned:
Error in xj[i, , drop = FALSE] : subscript out of bounds
When I re-import the same dataset, and then run:
GSMA <- as.data.frame(GSMA)
GSMA[2:4] <- scale(GSMA[2:4])
The dataframe columns scale correctly.
What is going on? Why is base scale not working in the first instance?
dput(head(GSMA))
structure(list(Country = c("GBR", "CHE", "DEU", "ROU", "LUX",
"KAZ"), entry = c(98.4974384307861, 95.6549962361654, 91.4044539133708,
90.8518393834432, 90.4088099797567, 88.0471547444662), medium = c(86.0081672668457,
93.0372142791748, 91.2993144989014, 100, 96.7348480224609, 100
), high = c(74.6774760159579, 84.1793060302734, 79.542350769043,
99.6931856328791, 97.031680020419, 92.5396745855158)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
weirdly, this is correct:
> scale(head(GSMA[2:4]))
entry medium high
[1,] 1.5644225 -1.5528676 -1.3233285
[2,] 0.8257534 -0.2694974 -0.3755223
[3,] -0.2788406 -0.5868048 -0.8380579
[4,] -0.4224492 1.0017748 1.1719851
[5,] -0.5375798 0.4056202 0.9065003
[6,] -1.1513063 1.0017748 0.4584233
attr(,"scaled:center")
entry medium high
92.47745 94.51326 87.94395
attr(,"scaled:scale")
entry medium high
3.848059 5.477022 10.025077
but this is not:
> GSMA[2:4] <- scale(GSMA[2:4])
> head(GSMA)
# A tibble: 6 x 4
Country entry[,"entry"] [,"medium"] [,"high"] medium[,"entry"] [,"medium"] [,"high"]
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GBR 2.13 1.25 0.870 2.13 1.25 0.870
2 CHE 2.00 1.52 1.27 2.00 1.52 1.27
3 DEU 1.80 1.46 1.07 1.80 1.46 1.07
4 ROU 1.78 1.80 1.92 1.78 1.80 1.92
5 LUX 1.76 1.67 1.81 1.76 1.67 1.81
6 KAZ 1.65 1.80 1.62 1.65 1.80 1.62
# ... with 3 more variables: high[,"entry"] <dbl>, [,"medium"] <dbl>, [,"high"] <dbl>
Known issue with Tibble 3.0.0. Revert to 2.1.3 for old behavior.
Or:
library(tibble)
iris <- as_tibble(iris)
scale <- scale(iris[1:3])
class(scale)
#> [1] "matrix"
iris[1:3] <- as.data.frame(scale)