I have a large data frame (388 x 729) and I am trying to calculate the frequency of daffodil bulbs (numeric column) per month (factor column)_over 14 years.
Structure of my data frame:
$ Month : Factor w/ 18 levels "April","April ",..: 9 8 8 8 8 8 8 8 8 1 ...
$ Daffodil Bulbs : num 0 3 0 3 2 1 0 0 0 0 ...
When I run my code, the months are in the wrong order and it's not pooling the frequency of daffodil bulbs over the 14 years per month:
Count_Daffodils <- MyDf %>% mutate(Month=ordered(trimws(Month), levels=month.name)) %>%
group_by(Month) %>%
count(Number_Daffodils, sort=TRUE) %>%
ungroup()
Current output:
# A tibble: 92 × 3
Month Number_Boats n
<ord> <dbl> <int>
1 June 0 86
2 May 0 73
3 November 0 67
4 April 0 51
5 February 0 46
6 March 0 36
7 December 0 35
8 July 0 24
9 October 0 22
10 September 0 18
# ℹ 82 more rows
# ℹ Use `print(n = ...)` to see more rows
The output I'm trying to achieve:
Month Number_Daffodils
<ord> <int>
1 January 86
2 February 73
3 March 67
4 April 51
5 May 46
6 June 36
7 July 35
8 August 24
9 September 22
10 October 56
11 November 14
12 December 18
Currently your code is:
Month
and Number_Daffodils
If you're looking to sum per month then summarise
is what you want to use. Your factoring of Month
is working perfectly to order the months variable, which can then be sorted after summing:
library(tidyverse)
tibble(
Month = sample(month.name, 120, replace = TRUE),
Number_Daffodils = sample(1:5, 120, replace = TRUE)
) |>
mutate(Month = factor(Month, levels = month.name)) |>
summarise(Number_Daffodils = sum(Number_Daffodils), .by = "Month") |>
arrange(Month)
#> # A tibble: 12 × 2
#> Month Number_Daffodils
#> <fct> <int>
#> 1 January 23
#> 2 February 44
#> 3 March 50
#> 4 April 33
#> 5 May 25
#> 6 June 24
#> 7 July 33
#> 8 August 40
#> 9 September 45
#> 10 October 20
#> 11 November 24
#> 12 December 27