Search code examples
rloopsdata-cleaning

How to apply same operation to multiple columns spread out in dataframe in R?


beginner here! I have a dataframe with over 600 columns. I want to apply some simple operations to specific columns that are all named similarly (see example below) but spread out throughout the dataframe. How can I build a loop for all these columns based on their names and the ascending numbers at the end of the column name?

Dataframe example:

Name  Col_x_1  Company  Col_x_2  Start_Year  End_Year  Col_x_3
asd   4 Col    Test     2 Col    1902        1933      1 Col
kfj   5 Cols   Test_2   10 Col   1933        1954      0
ale   0        Test_3   11 Cols  1988        1999      5 Col
...

Output example: I want to remove the string following the space behind the value in the columns named "Col_x_1, Col_x_2, ..." (going up until 56) and turn the columns into dbl.

Name  Col_x_1  Company  Col_x_2  Start_Year  End_Year  Col_x_3
asd   4        Test     2        1902        1933      1 
kfj   5        Test_2   10       1933        1954      0
ale   0        Test_3   11       1988        1999      5 
...

How can I build a loop to perform this simple operation? Thank you in advance!


Solution

  • You could use tidyverse:

    library(tidyverse)
    
    df %>% 
      mutate(across(starts_with("Col_x"), ~str_extract(., "^\\d+") %>% as.numeric()))
    

    This returns

    # A tibble: 3 × 7
      Name  Col_x_1 Company Col_x_2 Start_Year End_Year Col_x_3
      <chr>   <dbl> <chr>     <dbl>      <dbl>    <dbl>   <dbl>
    1 asd         4 Test          2       1902     1933       1
    2 kfj         5 Test_2       10       1933     1954       0
    3 ale         0 Test_3       11       1988     1999       5
    

    Or, depending on your actual data

    df %>% 
      mutate(across(starts_with("Col_x"), ~str_remove_all(., "(?<=\\d)\\s+[A-z]+") %>% as.numeric()))
    

    The first one extracts the starting number (assuming it's an integer) and removes the remaining content. The second one looks for a number followed by a space and some letters and removes them.

    Finally both convert the remaining part into a numeric.

    We use starts_with() to select the columns by a names pattern. There are other possibilities like ends_with(), matches() or contains(). The selection of the appropriate function depends on the actual structure of the names.