Search code examples
rdplyrscaleacross

Why `scale` when using mutate + across in dplyr create columns with `[,1]` at the end?


See code below.

the mutate(across(everything(), scale, .names = "{.col}_z")) part of the syntax is generating columns with [,1]appended at the end.

Two questions:

  1. Why is this happening?
  2. How can I avoid or remove it?
library(dplyr)

# Input
df_test <- tibble(x = c(1, 2, 3, 4), y = c(5, 6, 7, 8))

# My code generating x_z and y_z
df_scaled <- df_test %>% 
  mutate(across(everything(), scale, .names = "{.col}_z"))

# Output
df_scaled
#> # A tibble: 4 × 4
#>       x     y x_z[,1] y_z[,1]
#>   <dbl> <dbl>   <dbl>   <dbl>
#> 1     1     5  -1.16   -1.16 
#> 2     2     6  -0.387  -0.387
#> 3     3     7   0.387   0.387
#> 4     4     8   1.16    1.16

Expected output

#> # A tibble: 4 × 4
#>       x     y     x_z     y_z
#>   <dbl> <dbl>   <dbl>   <dbl>
#> 1     1     5  -1.16   -1.16 
#> 2     2     6  -0.387  -0.387
#> 3     3     7   0.387   0.387
#> 4     4     8   1.16    1.16

Created on 2022-12-30 with reprex v2.0.2


Solution

  • scale returns a matrix. We may either use c or extract the column with [ or use as.numeric to remove the dim attributes

    library(dplyr)
    df_test %>% 
      mutate(across(everything(),
         ~ as.numeric(scale(.x)), .names = "{.col}_z"))
    

    -output

    # A tibble: 4 × 4
          x     y    x_z    y_z
      <dbl> <dbl>  <dbl>  <dbl>
    1     1     5 -1.16  -1.16 
    2     2     6 -0.387 -0.387
    3     3     7  0.387  0.387
    4     4     8  1.16   1.16 
    

    i.e. check the output on a single column

    > scale(df_test[[1]])
               [,1]
    [1,] -1.1618950
    [2,] -0.3872983
    [3,]  0.3872983
    [4,]  1.1618950
    attr(,"scaled:center")
    [1] 2.5
    attr(,"scaled:scale")
    [1] 1.290994
    

    If we check the source code

    > scale.default
    function (x, center = TRUE, scale = TRUE) 
    {
        x <- as.matrix(x) # it is converting to matrix
    ...
    

    and is required in applying apply/colMeans/sweep, thus when we pass a vector to the scale, it does convert it to a single column matrix

    > as.matrix(df_test$x)
         [,1]
    [1,]    1
    [2,]    2
    [3,]    3
    [4,]    4