Search code examples
rdata-analysis

how to compare in R the first character of a date variable with a number


I have a column called Start_date (type is date) and I want to check the first number of the entry and compare it to a number.

I have achieved the result I want already using the following code:

filter(tripRawData_cleaned, Start_date < "2022-01-01")

This allowed me to find all rows older than the date.

Now I want to do this using a second method. By taking the first number of the Start_date '2' and comparing it to all the values in the column Start_date and select all the columns that do not start with '2'.

millenium <- substr((tripRawData_cleaned$Start_date),1,1)
millenium <- as.numeric(millenium)
str(millenium)
if(millenium < 2){
  print(tripRawData_cleaned$Start_date)
}

but this isn't working. the error message is : 'Error in if (millenium < 2) { : the condition has length > 1' Thanks


Solution

  • if() is not vectorized. One solution is to use a for() loop:

    tripRawData_cleaned<-data.frame(Start_date=as.Date("1999-12-30")+0:3,
                                    Value=0:3)
    
    tripRawData_cleaned
    #  Start_date Value
    #1 1999-12-30     0
    #2 1999-12-31     1
    #3 2000-01-01     2
    #4 2000-01-02     3
    
    for (i in 1:NROW(tripRawData_cleaned)){
      if(substr(tripRawData_cleaned[i,"Start_date"],1,1) < "2"){
        print(tripRawData_cleaned[i,"Start_date"])
      }  
    }
    
    #[1] "1999-12-30"
    #[1] "1999-12-31"
    

    A fast, simplified vectorized method is:

    tripRawData_cleaned[substr(tripRawData_cleaned$Start_date,1,1) != "2",]