I calculated the number of observation per month per year using dplyr to ensure the months are in the correct order from January to December, which produced an ordered factor.
I want to use the functions lubridate()
and month()
to set the components of year and month correctly for a time series analysis.
The function lubridate()
cannot process ordered factors (see R-code and error message). I tried unordering this column using x <- factor( x , ordered = FALSE)
and I lost all the information in the data frame except for Month
.
I tried setting the column 'Month' to basic factor levels, but I got this output:
Bulbs$Month <- as.factor(Bulbs$Month)
Error in
$<-.data.frame
(*tmp*
, Month, value = integer(0)) : replacement has 0 rows, data has 96
Does anyone know how to convert an ordered factor back to a normal factor but not lose the levels of ordering?
Structure of the data frame after the calculation with dplyr
:
'data.frame': 96 obs. of 4 variables:
$ Year : num 2012 2012 2012 2012 2012 ...
$ Month : Ord.factor w/ 12 levels "January"<"February"<..: 1 2 4 5 6 7 10 11 12 2 ...
$ Number_Daffodils : num 1 8 18 21 27 12 12 4 3 2 ...
$ Frequency_New_Bulbs : num 7 59 144 193 NA NA 143 22 14 26 ..
R code:
library(dplyr)
library(lubricate)
Bulbs <- MyDf %>% mutate(Month = factor(trimws(Month), levels = month.name, ordered = TRUE)) %>%
group_by(Year, Month) %>%
summarise(N = n(), Frequency_New_Bulbs = sum(Number_Daffodils))
#Set the components for the time series analysis
Bulbs <- janitor::clean_names(Bulbs)
Bulbs$Year <- lubridate::ymd(paste(Bulbs$year, Bulbs$month, "01", sep = "-"))
Bulbs$month = lubridate::month(Bulbs$month)
#When I run the line **dat$month = lubridate::month(dat$month)** I get this error message.
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
In addition: Warning message:
tz(): Don't know how to compute timezone for object of class ordered/factor; returning "UTC".
Dummy Dataframe
tibble(
Month = sample(month.name, 120, replace = TRUE),
Year = sample(2012:2024, 120, replace = TRUE),
Number_Daffodils = sample(1:5, 120, replace = TRUE)
)
Desired Output
year month Number_Daffodils Frequency_New_Bulbs date n_month
1 2015 January 36 31 2015-01-01 1
2 2015 February 28 28 2015-02-01 2
3 2015 March 39 31 2015-03-01 3
4 2015 April 46 30 2015-04-01 4
5 2015 May 5 6 2015-05-01 5
6 2015 June 0 0 2015-06-01 6
If your Month
factor levels are correct, you can convert it to integer or use it directly with lubridate::make_date()
:
library(dplyr)
Bulbs |>
janitor::clean_names() |>
mutate(date = lubridate::make_date(year = year, month = month),
m = as.integer(month))
#> # A tibble: 86 × 6
#> # Groups: year [13]
#> year month n frequency_new_bulbs date m
#> <int> <ord> <int> <int> <date> <int>
#> 1 2012 January 1 2 2012-01-01 1
#> 2 2012 February 4 9 2012-02-01 2
#> 3 2012 April 1 4 2012-04-01 4
#> 4 2012 May 3 10 2012-05-01 5
#> 5 2012 June 1 2 2012-06-01 6
#> 6 2012 July 1 2 2012-07-01 7
#> 7 2012 August 2 6 2012-08-01 8
#> 8 2012 September 1 2 2012-09-01 9
#> 9 2012 October 1 3 2012-10-01 10
#> 10 2012 November 2 9 2012-11-01 11
#> # ℹ 76 more rows