I have a dataframe 'df' that has categorical and POSIXct columns. The data look like:
Category | DateTime |
---|---|
A | 2022-08-29 00:00:00 |
A | 2022-08-29 00:00:00 |
A 1 | 2022-08-29 00:00:00 |
A 1 | 2022-08-29 00:00:00 |
A 1 | 2022-08-29 00:00:00 |
B | 2022-08-29 00:00:00 |
B | 2022-08-29 00:00:00 |
B | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
A | 2022-08-29 02:00:00 |
A 1 | 2022-08-29 02:00:00 |
B | 2022-08-29 02:00:00 |
B | 2022-08-29 02:00:00 |
B | 2022-08-29 02:00:00 |
B 1 | 2022-08-29 02:00:00 |
B 1 | 2022-08-29 02:00:00 |
B 1 | 2022-08-29 02:00:00 |
I would like to create a new dataframe 'df2' that counts each unique value in column 'Category' that doesn't end with " 1" by column 'DateTime', so that the results look like:
Category | DateTime | CatCount |
---|---|---|
A | 2022-08-29 00:00:00 | 2 |
B | 2022-08-29 00:00:00 | 3 |
A | 2022-08-29 02:00:00 | 1 |
B | 2022-08-29 02:00:00 | 3 |
We could filter
output the rows having 1, then do a count
library(dplyr)
library(stringr)
df1 %>%
filter(str_detect(Category, "\\s+1", negate = TRUE)) %>%
count(Category, DateTime, name = "CatCount")
-output
Category DateTime CatCount
1 A 2022-08-29 00:00:00 2
2 A 2022-08-29 02:00:00 1
3 B 2022-08-29 00:00:00 3
4 B 2022-08-29 02:00:00 3
df1 <- structure(list(Category = c("A", "A", "A 1", "A 1", "A 1", "B",
"B", "B", "B 1", "B 1", "B 1", "B 1", "B 1", "A", "A 1", "B",
"B", "B", "B 1", "B 1", "B 1"), DateTime = c("2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00")),
class = "data.frame", row.names = c(NA,
-21L))