Search code examples
rdplyrdata-manipulation

Replacing all non-numeric characters in certain columns in R


How can I remove all non-numeric characters from all columns expect "a"?

Simulated data

library(tidyverse)

d = tibble(a = c("Tom", "Mary", "Ben", "Jane", "Lucas", "Mark"),
           b = c("8P", "3", "6", "7", "5M", "U1"),
           c = c("2", "12", "6F", "7F", "Y1", "9I"))

d

enter image description here

Expected output should look as follows

enter image description here

Tidyverse solutions are especially appreciated!


Solution

  • You could use across (within mutate) to do it over all columns but a and use regex (within str_extract) to extract only numerics (and convert to numerics type).

    library(tidyverse)
    
    d |> 
      mutate(across(-a, ~ . |> str_extract("\\d+") |> as.numeric()))
    

    Output:

    # A tibble: 6 × 3
      a         b     c
      <chr> <dbl> <dbl>
    1 Tom       8     2
    2 Mary      3    12
    3 Ben       6     6
    4 Jane      7     7
    5 Lucas     5     1
    6 Mark      1     9