Search code examples
rdateshinyxlsx

R messy date to convert into proper year month sequence


I have a date column in xlsx with no sequence of dates. I would like to convert the date column so it appears in sequence starting year from 08-23 for example 4 Aug becomes 04-08-2023 and so on as the data added in for year 2024 onwards.

What i have tried from source Working with dates

FixDate-> as.Date(xlsx_filePath$Date, origin = "4 Aug")
openxlsx::convertToDate(xlsx_filePath$Date)

or

FixDate-> as.Date(xlsx_filePath$Date,  format = "%d/%m/%Y")
openxlsx::convertToDate(xlsx_filePath$Date)
Date Firm
1 Jan firm9
8 Jan firm10
1 Dec firm1
8 Dec firm2
14 Dec firm3
4 Aug firm4
11 Aug firm5
25 Aug firm6
1 Sep firm7
8 Sep firm8
14 Jan firm8

Expected Outcome

Date Firm
04-08-2023 firm4
11-08-2023 firm5
25-08-2023 firm6
01-09-2023 firm1
08-09-2023 firm2
01-12-2023 firm7
08-12-2023 firm8
14-12-2023 firm3
01-01-2024 firm9
08-01-2024 firm10
14-01-2024 firm10

Solution

  • This may help you with your problem. It may be not the most elegant solutions but it works. I used parse_date to get the day-months to a date. This will automatically add the current year (2024) to these dates. Then I used lubridate to substract 1 year from all dates later than 4 august. And then another mutate step to get the dates to the same format as your example.

        library(dplyr)
        library(lubridate)
        library(parsedate)
        
        dt <- c("1 Jan", "8 Jan", "1 Dec", "8 Dec", "14 Dec", "4 Aug", "11 Aug", 
                 "25 Aug", "1 Sep", "8 Sep", "14 Jan")
        firm <- c("firm9", "firm10","firm1", "firm2", "firm3", "firm4", "firm5",
                  "firm6", "firm7", "firm8", "firm8")
        df <- data.frame(dt, firm)
    
        df %>% mutate(dates = parse_date(dt)) %>% 
               mutate(dates = case_when(
                              dates >= "2024-08-04" ~ dates - years(1),
                              TRUE ~ dates)) %>% 
               mutate(dates = format(dates, "%d-%m-%Y"))
    
    
    
           dt   firm      dates
    1   1 Jan  firm9 01-01-2024
    2   8 Jan firm10 08-01-2024
    3   1 Dec  firm1 01-12-2023
    4   8 Dec  firm2 08-12-2023
    5  14 Dec  firm3 14-12-2023
    6   4 Aug  firm4 04-08-2023
    7  11 Aug  firm5 11-08-2023
    8  25 Aug  firm6 25-08-2023
    9   1 Sep  firm7 01-09-2023
    10  8 Sep  firm8 08-09-2023
    11 14 Jan  firm8 14-01-2024
    

    As a sidenote: I noticed that the date - firm combinations in the start table and the expected outcome table do not match up. I used the data from the start table to generate this outcome.