Search code examples
rtime-serieslubridatefactors

How to manage an ordered factor with the functions lubridate () and month() to set time components because they can't read them in R


I calculated the number of observation per month per year using dplyr to ensure the months are in the correct order from January to December, which produced an ordered factor.

I want to use the functions lubridate() and month() to set the components of year and month correctly for a time series analysis.

The function lubridate() cannot process ordered factors (see R-code and error message). I tried unordering this column using x <- factor( x , ordered = FALSE) and I lost all the information in the data frame except for Month.

I tried setting the column 'Month' to basic factor levels, but I got this output:

 Bulbs$Month <- as.factor(Bulbs$Month)    

Error in $<-.data.frame(*tmp*, Month, value = integer(0)) : replacement has 0 rows, data has 96

Does anyone know how to convert an ordered factor back to a normal factor but not lose the levels of ordering?

Structure of the data frame after the calculation with dplyr:

'data.frame':   96 obs. of  4 variables:
 $ Year          : num  2012 2012 2012 2012 2012 ...
 $ Month           : Ord.factor w/ 12 levels "January"<"February"<..: 1 2 4 5 6 7 10 11 12 2 ...
 $ Number_Daffodils     : num  1 8 18 21 27 12 12 4 3 2 ...
 $ Frequency_New_Bulbs : num  7 59 144 193 NA NA 143 22 14 26 ..

R code:

library(dplyr)
library(lubricate)

Bulbs <- MyDf %>% mutate(Month = factor(trimws(Month), levels = month.name, ordered = TRUE)) %>% 
                                group_by(Year, Month) %>% 
                                summarise(N = n(), Frequency_New_Bulbs = sum(Number_Daffodils))

#Set the components for the time series analysis

Bulbs <- janitor::clean_names(Bulbs)
Bulbs$Year <- lubridate::ymd(paste(Bulbs$year, Bulbs$month, "01", sep = "-"))
Bulbs$month = lubridate::month(Bulbs$month)

#When I run the line **dat$month = lubridate::month(dat$month)** I get this error message. 

Error in as.POSIXlt.character(as.character(x), ...) : 
  character string is not in a standard unambiguous format
In addition: Warning message:
tz(): Don't know how to compute timezone for object of class ordered/factor; returning "UTC". 

Dummy Dataframe

tibble(
       Month = sample(month.name, 120, replace = TRUE),
       Year = sample(2012:2024, 120, replace = TRUE),
       Number_Daffodils = sample(1:5, 120, replace = TRUE)
      ) 

Desired Output

 year    month Number_Daffodils Frequency_New_Bulbs       date n_month
1 2015  January             36                   31 2015-01-01       1
2 2015 February             28                   28 2015-02-01       2
3 2015    March             39                   31 2015-03-01       3
4 2015    April             46                   30 2015-04-01       4
5 2015      May              5                    6 2015-05-01       5
6 2015     June              0                    0 2015-06-01       6    

Solution

  • If your Month factor levels are correct, you can convert it to integer or use it directly with lubridate::make_date():

    library(dplyr)
    
    Bulbs |> 
      janitor::clean_names() |> 
      mutate(date = lubridate::make_date(year = year, month = month),
             m = as.integer(month))
    #> # A tibble: 86 × 6
    #> # Groups:   year [13]
    #>     year month         n frequency_new_bulbs date           m
    #>    <int> <ord>     <int>               <int> <date>     <int>
    #>  1  2012 January       1                   2 2012-01-01     1
    #>  2  2012 February      4                   9 2012-02-01     2
    #>  3  2012 April         1                   4 2012-04-01     4
    #>  4  2012 May           3                  10 2012-05-01     5
    #>  5  2012 June          1                   2 2012-06-01     6
    #>  6  2012 July          1                   2 2012-07-01     7
    #>  7  2012 August        2                   6 2012-08-01     8
    #>  8  2012 September     1                   2 2012-09-01     9
    #>  9  2012 October       1                   3 2012-10-01    10
    #> 10  2012 November      2                   9 2012-11-01    11
    #> # ℹ 76 more rows