Search code examples
rsubstringdata-cleaning

Add a substring based on a condition in r


I would like to add a substring add the end of a string variable in my data. This variable represents a question asked during a specific case and can be either irrelevant ("00"), partially relevant ("01") or very relevant ("02"). Hence I would like to make this a conditional statement based on the specific value of the case ID. Here is some sample data:

Case_ID Question
1234 QS1
4321 QS1
1234 QS3
1234 QS2
4321 QS3
4321 QS2

Where: Case_ID 1234: QS1 very relevant ("02") , QS2 irrelevant ("00"), QS3 irrelevant ("00) AND Case_ID 4321: QS1 irrelevant ("00"), QS2 partially relevant ("01"), QS3 very relevant ("02).

I hope to receive the following output:

Case_ID Question
1234 QS102
4321 QS100
1234 QS300
1234 QS200
4321 QS302
4321 QS201

Solution

  • You can do a very 'by hand' solution with if statements

    df = tibble(
      Case_ID = c(1234, 4321,    1234,  1234,    4321,  4321),
      Question = c("QS1", "QS1", "QS3", "QS2", "QS3", "QS2"))
    
    relevance = function(case, question) {
      if(case == 1234) {
        if(question == "QS1") {"02"}
        else {"00"}}
      else {
        if(question == "QS1") {"00"}
        else if(question == "QS2") {"01"}
        else {"02"}}}
    
    df %>%
      rowwise() %>%
      mutate(Question = paste0(Question, relevance(Case_ID, Question)))
    
    # A tibble: 6 x 2
    # Rowwise: 
      Case_ID Question
        <dbl> <chr>   
    1    1234 QS102   
    2    4321 QS100   
    3    1234 QS300   
    4    1234 QS200   
    5    4321 QS302   
    6    4321 QS201