Issue:
I have a large data frame (388 x 729) and I am trying to calculate the mean for each month (over 14 years) which is a factor using a numeric column called 'Daffodil_Bulbs'.
I created a vector so the months are outputted in the right order, but when I run my R-code using the package dplyr, it is not reading the month 'July', and replacing this with an 'NA' (See the R code output below).
I've checked my data frame and there are no NAs or missing values
Does anyone know how to fix this issue?
R-code:
#Create a vector so the months are in the right order
month_levels = c('January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September', 'October', 'November', 'December')
#Use dplyr to subset the data to find the average group size per month
Df_Average_Month <- MyDf %>% dplyr::mutate(Month=ordered(Month, levels=month_levels)) %>%
group_by(Month) %>%
summarise(Average_Daffodiles = mean(Daffodile_Bulbs, na.rm = TRUE))
Output from the vector for month
> month_levels = c('January', 'February', 'March', 'April', 'May', 'June', 'July',
+ 'August', 'September', 'October', 'November', 'December')
Dataframe structure
$ Month : Factor w/ 18 levels "April","April ",..: 9 8 8 8 8 8 8 8 8 1 ...
$ Daffodil Bulbs : num 0 3 0 3 2 1 0 0 0 0 ...
R-code Output
# A tibble: 12 × 2
Month Average_Daffodils
<ord> <dbl>
1 January 11.4
2 February 11.3
3 March 12.4
4 April 8.67
5 May 12.6
6 June 12.5
7 August 9.67
8 September 12.7
9 October 9.92
10 November 9.19
11 December 10.8
12 NA 16.3
It seems like dplyr
might be skipping factor levels that have no corresponding data in your group. Make sure to check if all levels are represented in your dataset. Consider using droplevels()
to clean up any unused factor levels. Also, check for NA
values that could affect your grouping.