Search code examples
rdataframedatetimecountdata-manipulation

R Count unique values without specific symbol


I have a dataframe 'df' that has categorical and POSIXct columns. The data look like:

Category DateTime
A 2022-08-29 00:00:00
A 2022-08-29 00:00:00
A 1 2022-08-29 00:00:00
A 1 2022-08-29 00:00:00
A 1 2022-08-29 00:00:00
B 2022-08-29 00:00:00
B 2022-08-29 00:00:00
B 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
B 1 2022-08-29 00:00:00
A 2022-08-29 02:00:00
A 1 2022-08-29 02:00:00
B 2022-08-29 02:00:00
B 2022-08-29 02:00:00
B 2022-08-29 02:00:00
B 1 2022-08-29 02:00:00
B 1 2022-08-29 02:00:00
B 1 2022-08-29 02:00:00

I would like to create a new dataframe 'df2' that counts each unique value in column 'Category' that doesn't end with " 1" by column 'DateTime', so that the results look like:

Category DateTime CatCount
A 2022-08-29 00:00:00 2
B 2022-08-29 00:00:00 3
A 2022-08-29 02:00:00 1
B 2022-08-29 02:00:00 3

Solution

  • We could filter output the rows having 1, then do a count

    library(dplyr)
    library(stringr)
    df1 %>%
       filter(str_detect(Category, "\\s+1", negate = TRUE)) %>%
       count(Category, DateTime, name = "CatCount")
    

    -output

     Category            DateTime CatCount
    1        A 2022-08-29 00:00:00        2
    2        A 2022-08-29 02:00:00        1
    3        B 2022-08-29 00:00:00        3
    4        B 2022-08-29 02:00:00        3
    

    data

    df1 <- structure(list(Category = c("A", "A", "A 1", "A 1", "A 1", "B", 
    "B", "B", "B 1", "B 1", "B 1", "B 1", "B 1", "A", "A 1", "B", 
    "B", "B", "B 1", "B 1", "B 1"), DateTime = c("2022-08-29 00:00:00", 
    "2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
    "2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
    "2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
    "2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00", 
    "2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00", 
    "2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00", 
    "2022-08-29 02:00:00", "2022-08-29 02:00:00")),
    class = "data.frame", row.names = c(NA, 
    -21L))