Search code examples
rdata-manipulation

Create a new column based on existing multiple columns and rows in R


I have been struggling to create a new column based on the values of certain rows and columns in another dataframe. I have a data frame that looks like this:

sample_ID date test_result
sample1 1/1/2022 positive
sample1 1/1/2022 negative
sample1 1/1/2022 negative
sample2 2/1/2022 positive
sample2 3/1/2022 negative
sample3 4/1/2022 negative
sample3 5/1/2022 positive
sample4 5/1/2022 negative
sample4 6/1/2022 negative
sample4 7/1/2022 negative

I want to create a new column with a decision of the final result of each sample ID. If sample is positive at any date, the final result will be the test result of the earliest date of positivity, otherwise the sample is negative. The results should look like this:

sample_ID date test_result final_result
sample1 1/1/2022 positive positive
sample1 1/1/2022 negative positive
sample1 1/1/2022 negative positive
sample2 2/1/2022 positive positive
sample2 3/1/2022 negative positive
sample3 4/1/2022 negative positive
sample3 5/1/2022 positive positive
sample4 5/1/2022 negative negative
sample4 6/1/2022 negative negative
sample4 7/1/2022 negative negative

I did try with ifelse and loop but it was not successful. I would appreciate any help. Thank you very much.


Solution

  • You can use the library dplyr for this, doing a group_by on sample_ID and checking if any of the corresponding values was positive:

    library(dplyr)
    
    data = "sample_ID   date    test_result
                     sample1    1/1/2022    positive
                     sample1    1/1/2022    negative
                     sample1    1/1/2022    negative
                     sample2    2/1/2022    positive
                     sample2    3/1/2022    negative
                     sample3    4/1/2022    negative
                     sample3    5/1/2022    positive
                     sample4    5/1/2022    negative
                     sample4    6/1/2022    negative
                     sample4    7/1/2022    negative"
    
    df <- read.table(text=data, sep="\t", header = TRUE)
    
    df %>% 
      group_by(sample_ID) %>% 
      mutate(final_result = ifelse(any(test_result == 'positive'),'positive','negative')) %>% 
      ungroup()