Search code examples
rdataframegroup-bypivotspread

Splitting the date alongside pivot_wider


I've the following table.

Date Cat
15/2/1999 A
15/2/1999 A
15/2/1999 B
15/5/1999 A
15/5/1999 B
15/10/1999 C
15/10/1999 C
15/2/2001 A
15/2/2001 A
15/6/2001 B
15/6/2001 B
15/6/2001 C
15/11/2001 C
15/11/2001 C

I would like to apply pivot_wider (or any other similar functions) to it and also accounting for the Date and Year column as seen below. The Cat column is being split based on the variable A, B and C and the count is being displayed.

Month Year A B C Total
February 1999 2 1 0 3
May 1999 1 1 0 2
October 1999 0 0 2 2
February 2001 2 0 0 2
June 2001 0 2 1 3
November 2001 0 0 2 2

Does anyone here knows how I can do both together? Thanks


Solution

  • You can do this with tidyverse packages. First, format your date column as date, then count by month, pivot to wider and format the table.

    library(tidyverse)
    data %>% 
      mutate(Date = as.Date(Date, format = "%d/%m/%Y")) %>% 
      group_by(Cat, month = lubridate::floor_date(Date, "month")) %>% 
      count(Cat) %>% 
      pivot_wider(names_from = Cat, values_from = n, values_fill = 0) %>% 
      mutate(year = year(month), .before = "A",
             month = month(month, label = T, abbr = F)) %>% 
      mutate(Total = rowSums(across(A:C))) %>% 
      arrange(year)
    
      month     year     A     B     C Total
      <ord>    <dbl> <int> <int> <int> <dbl>
    1 February  1999     2     1     0     3
    2 May       1999     1     1     0     2
    3 October   1999     0     0     2     2
    4 February  2001     2     0     0     2
    5 June      2001     0     2     1     3
    6 November  2001     0     0     2     2
    

    data

    data <- structure(list(Date = c("15/2/1999", "15/2/1999", "15/2/1999", 
    "15/5/1999", "15/5/1999", "15/10/1999", "15/10/1999", "15/2/2001", 
    "15/2/2001", "15/6/2001", "15/6/2001", "15/6/2001", "15/11/2001", 
    "15/11/2001"), Cat = c("A", "A", "B", "A", "B", "C", "C", "A", 
    "A", "B", "B", "C", "C", "C")), class = "data.frame", row.names = c(NA, 
    -14L))