Search code examples
rmathdata-manipulationinteger-arithmetic

R: Divide column Y by Z with absolute (positive) output, then sum output per unique value for column X


for every jobtask I am trying to divide value by weight with two separate columns for calculating +1 values and the -1 values. For outputnegative, I would specifically need absolute (positive) values.

Having this, I would then like to add 2 more columns again that per value for occupation (a/b/c) sums the outputpositive and outputnegative. I can't seem to figure it out, any help would be much appreciated!!!

occupation <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c")
jobtask <- c("1", "2", "3", "4","5", "6", "7", "8", "9", "10", "11", "12")
value <- c("1", "1", "0", "-1", "-1", "0", "-1", "1", "-1", "1", "0", "0")
weight <- c("95", "81", "97", "65", "43", "92", "89", "43", "58", "99", "35", "69")

df <- data.frame(occupation, jobtask, value, weight)

Desired output below:

   occupation jobtask value weight outputpos outputneg occupationpos occupationneg
1           a       1     1     95      0.95                    1.76          0.65
2           a       2     1     81      0.81                    1.76          0.65
3           a       3     0     97                              1.76          0.65
4           a       4    -1     65               -0.65          1.76          0.65
5           b       5    -1     43               -0.43          0.43          1.22
6           b       6     0     92                              0.43          1.22
7           b       7    -1     89               -0.89          0.43          1.22
8           b       8     1     43      0.43                    0.43          1.22
9           c       9    -1     58               -0.58          0.99          0.58
10          c      10     1     99      0.99                    0.99          0.58
11          c      11     0     35                              0.99          0.58
12          c      12     0     69                              0.99          0.58

Extra addition to fix multiple years. In the df every row is one year, so the "sum" for that specific occupation is now 5.42, while it should be 0.95 + 0.81 = 1.76 like it was in the example above. Every jobtask has different number of year, so cannot simply divide by one number to account for the number of years. Any suggestions?

 occupation jobtask year value weight outputpos outputneg occupationpos occupationneg
1          a       1 2015     1     95      0.95                    5.42              
2          a       1 2016     1     95      0.95                    5.42             
3          a       1 2017     1     95      0.95                    5.42             
4          a       1 2018     1     95      0.95                    5.42             
5          a       2 2015     1     81      0.81                    5.42             
6          a       2 2016     1     81      0.81                    5.42             ```


Solution

  • We could first convert the column types to numeric with type.convert, then create a column of 'output' by dividing 'weight' by 100, then with case_when create the 'outputpos', 'outputneg' based on the value 1 or -1, grouped by 'occupation', get the sum of 'output' where 'value' is 1, and 'value' is -1 to create the 'occupation_pos', 'occupation_neg'

    library(dplyr)
    df <- type.convert(df, as.is = TRUE)
    df %>% 
       mutate(output = weight/100, 
              outputpos = case_when(value == 1 ~ output), 
              outputneg = case_when(value == -1 ~ -1 *output)) %>% 
       group_by(occupation) %>% 
       mutate(occupation_pos = sum(output[value == 1]),
              occupation_neg = sum(output[value == -1])) %>%
       select(-output)
    # A tibble: 12 x 8
    # Groups:   occupation [3]
    #   occupation jobtask value weight outputpos outputneg occupation_pos occupation_neg
    #   <chr>        <int> <int>  <int>     <dbl>     <dbl>          <dbl>          <dbl>
    # 1 a                1     1     95      0.95    NA               1.76          0.65 
    # 2 a                2     1     81      0.81    NA               1.76          0.65 
    # 3 a                3     0     97     NA       NA               1.76          0.65 
    # 4 a                4    -1     65     NA       -0.65            1.76          0.65 
    # 5 b                5    -1     43     NA       -0.43            0.43          1.32 
    # 6 b                6     0     92     NA       NA               0.43          1.32 
    # 7 b                7    -1     89     NA       -0.89            0.43          1.32 
    # 8 b                8     1     43      0.43    NA               0.43          1.32 
    # 9 c                9    -1     58     NA       -0.580           0.99          0.580
    #10 c               10     1     99      0.99    NA               0.99          0.580
    #11 c               11     0     35     NA       NA               0.99          0.580
    #12 c               12     0     69     NA       NA               0.99          0.580