Search code examples
rdataframetidyversenumericaltibble

R - Combine non numerical and numerical data together in the same cell in a dataframe


I want to combine cell values with numeric and character data into the same cell.

Specifically, I'm trying to combine named columns with the 'SEM' columns, with a ± symbol in between, to eventually become a table I can publish with latex

# A tibble: 4 x 10
  Variety       n Probes Probes SEM Walks Walks SEM Cleans Cleans SEM Off_Leaf Off_SEM
  <chr>     <int>  <dbl>        <dbl> <dbl>       <dbl>  <dbl>        <dbl>    <dbl>   <dbl>
1 10LB mean    41   1.40        0.140 0.710       0.170  0.460       0.140    0.120   0.0520
2 3LB mean     48   1.50        0.130 0.880       0.170  0.310       0.0900   0.190   0.0710
3 4LB mean     43   1.80        0.160 1.10        0.190  0.370       0.120    0.280   0.190 
4 RB mean      44   2.80        0.390 1.50        0.260  0.180       0.0750   0.0910  0.0440

Is there any way to make the previous table look like this:

# A tibble: 4 x 6
  Variety     n Probes    Walks     Cleans     Off Leaf 
  <chr>   <int> <chr>     <chr>     <chr>      <chr>      
1 10LB       41 1.4 ± 0.1 0.7 ± 0.2 0.5 ± 0.1  0.1 ± 0.05 
2 3LB        48 1.5 ± 0.1 0. 9± 0.2 0.3 ± 0.09 0.2 ± 0.07 
3 4LB        43 1.8 ± 0.2 1.1 ± 0.2 0.4 ± 0.1  0.3 ± 0.2  
4 RB         44 2.8 ± 0.4 1.5 ± 0.3 0.2 ± 0.07 0.09 ± 0.04

while remaining in R?


Solution

  • With the following dataset.

    ds <- tibble::tribble(
      ~Variety, ~n, ~Probes, ~`Probes SEM`, ~`Walks`, ~`Walks SEM`, ~`Cleans`, ~`Cleans SEM`, ~`Off_Leaf`, ~`Off_SEM`,
      "10LB mean"   , 41L, 1.40, 0.140, 0.710, 0.170, 0.460, 0.140   , 0.120 ,  0.0520,
      "3LB mean"    , 48L, 1.50, 0.130, 0.880, 0.170, 0.310, 0.0900  , 0.190 ,  0.0710,
      "4LB mean"    , 43L, 1.80, 0.160, 1.10 , 0.190, 0.370, 0.120   , 0.280 ,  0.190 ,
      "RB mean"     , 44L, 2.80, 0.390, 1.50 , 0.260, 0.180, 0.0750  , 0.0910,  0.0440
    )
    

    The \u00b1 character will produce the plus/minus in a way that's hopefully portable across files with different encodings.

    library(magrittr)
    ds %>% 
      dplyr::mutate(
        Probes    = sprintf("%2.1f \u00B1 %3.3f", .data$Probes  , .data$`Probes SEM`),
        Walks     = sprintf("%3.2f \u00B1 %3.3f", .data$Walks   , .data$`Walks SEM` ),
        Cleans    = sprintf("%3.2f \u00B1 %3.3f", .data$Cleans  , .data$`Cleans SEM`),
        Off_Leaf  = sprintf("%3.2f \u00B1 %2.2f", .data$Off_Leaf, .data$Off_SEM     )
      ) %>% 
      dplyr::select(
        -`Probes SEM`, -`Walks SEM`, -`Cleans SEM`, -Off_SEM
      )
    

    Although it's not part of your question, I recommend something like sprintf() to make sure the elements have the same number of digits all the way down a column. It looks better with padded zeros, and it takes some of the burden off LaTeX to align it correctly.

    Output:

    # A tibble: 4 x 6
      Variety       n Probes      Walks        Cleans       Off_Leaf   
      <chr>     <int> <chr>       <chr>        <chr>        <chr>      
    1 10LB mean    41 1.4 ± 0.140 0.71 ± 0.170 0.46 ± 0.140 0.12 ± 0.05
    2 3LB mean     48 1.5 ± 0.130 0.88 ± 0.170 0.31 ± 0.090 0.19 ± 0.07
    3 4LB mean     43 1.8 ± 0.160 1.10 ± 0.190 0.37 ± 0.120 0.28 ± 0.19
    4 RB mean      44 2.8 ± 0.390 1.50 ± 0.260 0.18 ± 0.075 0.09 ± 0.04
    

    Also be aware of the LaTeX \pm command if you use a table-formatting package like kableExtra or xtable that may process the unicode different, but allows you to escape the \pm.