Search code examples
rvariablesinverse

I´m looking for a way to inverse the values of 2 columns of a row when this inverse exists in the dataframe?


Here is my dataframe:

DF <- data.frame(
     VAR1 = c("A", "A", "B", "B", "B", "C", "C"),
     VAR2 = c("B", "C", "A", "D", "C", "B", "D"),
     VAR3 = c(1, 1, 1, 2, 4, 6, 4)
    )

I would like to have this:

VAR1 VAR2 VAR3  
A    B    2 
A    C    1
B    D    2
B    C    10  
C    D    4

If There is two rows like (VAR1=A, VAR2=B, VAR3=X) and (VAR2=B, VAR1=A, VAR3=Y), I want to have one row like this one (VAR1=A, VAR2=B, VAR3=X+Y). So if the two first variables are "inverse", I would like to have one row with the sum of them.

I tried to have a column which says "Yes" if two rows have inverse values but I can´t find a way to do it. My code:

DF <- DF %>%
  mutate(VAR4 = case_when(VAR2 %in% DF$VAR1 & 
                          VAR1 %in% 
                            (DF %>% 
                                filter(VAR1 == VAR2) %>% 
                                pull(VAR2)
                             ) ~ "Yes",
                             TRUE ~ 'No' ))
    `

This is the result:

VAR1   VAR2   VAR3   VAR4  
A      B      1      No
A      C      1      No 
B      A      1      No 
B      D      2      No 
B      C      4      No 
C      B      6      No 
C      D      4      No

My code doesn´t work because my filter doesn´t take the result of VAR2 %in% DF$VAR1 in account.

Does someone have an idea?


Solution

  • You can sort first with apply, and then summarise:

    DF[1:2] <- t(apply(DF[1:2], 1, sort))
    DF %>% 
      group_by(VAR1, VAR2) %>% 
      summarise(VAR3 = sum(VAR3))
    
    # A tibble: 5 × 3
    # Groups:   VAR1 [3]
      VAR1  VAR2   VAR3
      <chr> <chr> <dbl>
    1 A     B         2
    2 A     C         1
    3 B     C        10
    4 B     D         2
    5 C     D         4
    

    Or, in single pipe:

    DF %>% 
      mutate(VAR = pmap(., ~ sort(c(..1, ..2)) %>% 
                          set_names(., c("VAR1", "VAR2")))) %>% 
      group_by(VAR) %>% 
      summarise(VAR3 = sum(VAR3)) %>% 
      unnest_wider(VAR)