Search code examples
rfunctionif-statementdatediff

R - created function does not output correctly


I am attempting to find the difference between two dates and then group that value into factor levels. I have done this before with other numeric values but not dates and can't figure what I am doing incorrectly. I don't get any errors on the function creation but have tried two different ways to apply it.

I originally calculated it in days because I need a day value later on. Grouping it into weeks is to create some levels for visualization later.

#created Lead_time column to calculate how far in advance appointment was booked
#formated in days
df7$Lead_Time <- difftime(df7$Appointment_Date_Time, df7$appt_create_date, units = "days")
#to negate when negatives are created due to the appointment being created after the start time
df7$Lead_Time <- as.integer(df7$Lead_Time)

#group Lead_Time by weeks
group_Lead_Time <- function(Lead_Time){
  if (Lead_Time <= 28){
    return('0-4 Weeks')
  }else if(Lead_Time > 29 & Lead_Time <= 56){
    return('5-8 Weeks')
  }else if (Lead_Time > 57 & Lead_Time <= 84){
    return('8-12 Weeks')
  }else if (Lead_Time > 85 & Lead_Time <= 112){
    return('12-16 Weeks')
  }else if (Lead_Time > 113 & Lead_Time <=140){
    return('16-20 Weeks')  
  }else if (Lead_Time > 141 & Lead_Time <=168){
    return('20-24 Weeks')  
  }else if (Lead_Time > 168){
    return('24+ Weeks')
  }
}
df7$Lead_Time_Grouped <- as.factor(group_Lead_Time(df7$Lead_Time))
df7$Lead_Time_Grouped <- sapply(df7$Lead_Time,group_Lead_Time)

If someone has a better way to handle the negative values I am open to it as well. These are the error messages I get:

> df7$Lead_Time_Grouped <- as.factor(group_Lead_Time(df7$Lead_Time))
Warning messages:
1: In if (Lead_Time <= 28) { :
  the condition has length > 1 and only the first element will be used
2: In if (Lead_Time > 29 & Lead_Time <= 56) { :
  the condition has length > 1 and only the first element will be used
3: In if (Lead_Time > 57 & Lead_Time <= 84) { :
  the condition has length > 1 and only the first element will be used
4: In if (Lead_Time > 85 & Lead_Time <= 112) { :
  the condition has length > 1 and only the first element will be used
> df7$Lead_Time_Grouped <- sapply(df7$Lead_Time,group_Lead_Time)
Error in if (Lead_Time <= 28) { : missing value where TRUE/FALSE needed

UPDATE/EDIT: Thanks for pointing me in the direction of ifelse. Was able to resolve my challenge with the code below.

#group Lead_Time by weeks
group_Lead_Time <- function(appt_lead_time){
  ifelse (appt_lead_time <= 28,'0-4 Weeks',
          ifelse (appt_lead_time > 29 & appt_lead_time <= 56, '5-8 Weeks',
                  ifelse (appt_lead_time > 57 & appt_lead_time <= 84, '8-12 Weeks',
                          ifelse (appt_lead_time > 85 & appt_lead_time <= 112, '12-16 Weeks',
                                  ifelse (appt_lead_time > 113 & appt_lead_time <=140, '16-20 Weeks',
                                      ifelse (appt_lead_time > 141 & appt_lead_time <=168, '20-24 Weeks',
                                              '24+ Weeks'))))))
                                                    
  }

df7$appt_lead_time_weeks <- group_Lead_Time(df7$appt_lead_time)

Solution

  • With help from the comments I was able to come up with the solution below:

    #group Lead_Time by weeks
    group_Lead_Time <- function(appt_lead_time){
      ifelse (appt_lead_time <= 28,'0-4 Weeks',
      ifelse (appt_lead_time > 29 & appt_lead_time <= 56, '5-8 Weeks',
      ifelse (appt_lead_time > 57 & appt_lead_time <= 84, '8-12 Weeks',
      ifelse (appt_lead_time > 85 & appt_lead_time <= 112, '12-16 Weeks',
      ifelse (appt_lead_time > 113 & appt_lead_time <=140, '16-20 Weeks',
      ifelse (appt_lead_time > 141 & appt_lead_time <=168, '20-24 Weeks',
                                                  '24+ Weeks'))))))
      }
    
    df7$appt_lead_time_weeks <- group_Lead_Time(df7$appt_lead_time)