Search code examples
rdplyrreprex

Test condition of two columns on groups


I'm trying to make a new column that checks on a group (id and number) if two columns have the same observations (classification and classification-1").

This is the original data frame:

reprex <- tribble(~"id",    ~"number",  ~"year",   ~"classification",          ~"classification-1",
                  5,        7020,    2015,    "Trading de servicios",    "Servicios empresariales",
                  2,        4649,    2015,                 "Trading",                  "Comercial",
                  2,        4649,    2015,               "Comercial",                    "Trading",
                  2,        4649,    2016,                 "Trading",                  "Comercial",
                  2,        4649,    2016,               "Comercial",                    "Trading",
                  3,        4651,      2015,                   "Trading",                    "Comercial",
                  3,        4651,      2015,                   "Trading",                   "Comisiones",
                  3,        4651,      2015,                 "Comercial",                      "Trading",
                  3,        4651,      2015,                 "Comercial",                   "Comisiones")

I want to get this:

reprex <- tribble(~"id",    ~"number",  ~"year",   ~"classification",          ~"classification-1", ~"check",
                  5,        7020,    2015,    "Trading de servicios",    "Servicios empresariales",        T,
                  2,        4649,    2015,                 "Trading",                  "Comercial",        T
                  2,        4649,    2015,               "Comercial",                    "Trading",        T
                  2,        4649,    2016,                 "Trading",                  "Comercial",        T
                  2,        4649,    2016,               "Comercial",                    "Trading",        T
                  3,        4651,      2015,                   "Trading",                    "Comercial",        F
                  3,        4651,      2015,                   "Trading",                   "Comisiones",        F
                  3,        4651,      2015,                 "Comercial",                      "Trading",        F
                  3,        4651,      2015,                 "Comercial",                   "Comisiones",        F)

Solution

  • Perhaps this would help

    library(dplyr)
    reprex %>%
        group_by(id, number) %>% 
        mutate(check = length(intersect(classification, `classification-1`)) > 0)
    

    Of if we need to check all the unique elements, then after grouping by 'id', 'number', get the unique elements of both classification, classification-1, check whether they are equal with setequal

    reprex %>%
        group_by(id, number) %>%
        mutate(check = setequal(sort(unique(classification)), 
                                  sort(unique(`classification-1`))))