Search code examples
rif-statementminimum

R test if value is lowest from group, add 'yes'/'no' in new column if value is lowest from group


I'm relatively new to R and running into a problem I can't seem to solve. My apologies if this question has been asked before, but answers related to 'finding lowest' I'm running into here seem to focus on extracting the lowest value, I haven't found much about using it as a condition to add new values to a column.

A simplified example of what I'm trying to achieve is below. I have a list of building names and the years they have been in use, and I want to add to the column first_year "yes" and "no" depending on if the year the building is in use is the first year or not.

building_name   year_inuse    first_year
office          2020          yes 
office          2021          no 
office          2022          no
office          2023          no 
house           2020          yes
house           2021          no
house           2022          no
house           2023          no
retail          2020          yes
retail          2021          no
retail          2022          no
retail          2023          no

I grouped the data by the building names, and now I'm thinking about doing something like:

data_new <- data %>% mutate(first_year = if_else(...., "yes", "no"))

so add a condition in the if_else that tests if the year is the lowest from the group, and if so add a yes, otherwise add a no. However, I can't seem to figure out how to do this and if this is even the best approach.

Help is much appreciated.


Solution

  • If the 'year_inuse' is not ordered, use arrange before doing this i.e. arrange by 'building_name', 'year_inuse', create a logical vector with duplicated, convert it to numeric index (1 + ), then use that index to replace with a vector of values i.e. 'yes', 'no'

    library(dplyr)
    data_new <- data %>%
            arrange(building_name, year_inuse) %>%
            mutate(first_year = c("no", "yes")[1 + !duplicated(building_name)])
    

    -ouptut

    #    building_name year_inuse first_year
    #1          house       2020        yes
    #2          house       2021         no
    #3          house       2022         no
    #4          house       2023         no
    #5         office       2020        yes
    #6         office       2021         no
    #7         office       2022         no
    #8         office       2023         no
    #9         retail       2020        yes
    #10        retail       2021         no
    #11        retail       2022         no
    #12        retail       2023         no
    

    data

    data <- structure(list(building_name = c("office", "office", "office", 
    "office", "house", "house", "house", "house", "retail", "retail", 
    "retail", "retail"), year_inuse = c(2020L, 2021L, 2022L, 2023L, 
    2020L, 2021L, 2022L, 2023L, 2020L, 2021L, 2022L, 2023L)),
     row.names = c(NA, 
    -12L), class = "data.frame")