Search code examples
rfunctiondataframeif-statementextract

conditional extraction of data from dataframe in R


I have a data frame as shown below. What I want to do is, for example, the Aston Martin car is 4 years old, so I have to pull the relevant data from the year4_6 column. Or the Honda car is 7 years old, so I should use the year7_11 data.

I tried to write an if-else statement but I couldn't. I wonder what kind of code can I write for this, should I use a package?

hub.cc     hub.make hub.age hub.year1_3 hub.year4_6 hub.year7_11
1300 Aston Martin       4         480         335          189
1800       Toyota       1        1352        1059          624
2000        Honda       7        2129        1642          965
1600        Honda       9         768         576          335

Solution

  • You can construct the column_name string using the paste() function and then select that column. The issue here is that the suffixes _3, _6 and _11 are inconvenient. So I will use the starts_with functionality of the tidyverse packages.

    for row 3:

    library(tidyverse)
    age = df[3, "hub.age"]
    column_name = paste("hub.year", age, sep="")
    data = df %>% 
           slice(3) %>% 
           select(starts_with(column_name))
    

    If you drop the suffixes (so you columns are named hub.year1, hub.year4, hub.year7), it simplifies a lot. Then, you can use the following base R syntax for column referencing: df[, 'column_name'].

    for row 3 again:

    age = df[3, "hub.age"]
    column_name = paste("hub.year", age, sep="")
    data = df[3, column_name]
    

    If the data is small, you can just iterate over the dataframe with a for loop and replace 3 with i to get the data for all rows. If the data is big, you might wanna use something like mutate() from the tidyverse.

    Note: I did not test the code above since you did not provide the data in a convenient reproducible way using dput. If the answer does not meet your expectations, please clarify and provide the data e.g. with dput.