for every jobtask
I am trying to divide value
by weight
with two separate columns for calculating +1 values and the -1 values. For outputnegative
, I would specifically need absolute (positive) values.
Having this, I would then like to add 2 more columns again that per value for occupation
(a/b/c) sums the outputpositive
and outputnegative
. I can't seem to figure it out, any help would be much appreciated!!!
occupation <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c")
jobtask <- c("1", "2", "3", "4","5", "6", "7", "8", "9", "10", "11", "12")
value <- c("1", "1", "0", "-1", "-1", "0", "-1", "1", "-1", "1", "0", "0")
weight <- c("95", "81", "97", "65", "43", "92", "89", "43", "58", "99", "35", "69")
df <- data.frame(occupation, jobtask, value, weight)
Desired output below:
occupation jobtask value weight outputpos outputneg occupationpos occupationneg
1 a 1 1 95 0.95 1.76 0.65
2 a 2 1 81 0.81 1.76 0.65
3 a 3 0 97 1.76 0.65
4 a 4 -1 65 -0.65 1.76 0.65
5 b 5 -1 43 -0.43 0.43 1.22
6 b 6 0 92 0.43 1.22
7 b 7 -1 89 -0.89 0.43 1.22
8 b 8 1 43 0.43 0.43 1.22
9 c 9 -1 58 -0.58 0.99 0.58
10 c 10 1 99 0.99 0.99 0.58
11 c 11 0 35 0.99 0.58
12 c 12 0 69 0.99 0.58
Extra addition to fix multiple years. In the df every row is one year, so the "sum" for that specific occupation is now 5.42, while it should be 0.95 + 0.81 = 1.76 like it was in the example above. Every jobtask
has different number of year
, so cannot simply divide by one number to account for the number of years. Any suggestions?
occupation jobtask year value weight outputpos outputneg occupationpos occupationneg
1 a 1 2015 1 95 0.95 5.42
2 a 1 2016 1 95 0.95 5.42
3 a 1 2017 1 95 0.95 5.42
4 a 1 2018 1 95 0.95 5.42
5 a 2 2015 1 81 0.81 5.42
6 a 2 2016 1 81 0.81 5.42 ```
We could first convert the column types to numeric with type.convert
, then create a column of 'output' by dividing 'weight' by 100, then with case_when
create the 'outputpos', 'outputneg' based on the value 1 or -1, grouped by 'occupation', get the sum
of 'output' where 'value' is 1, and 'value' is -1 to create the 'occupation_pos', 'occupation_neg'
library(dplyr)
df <- type.convert(df, as.is = TRUE)
df %>%
mutate(output = weight/100,
outputpos = case_when(value == 1 ~ output),
outputneg = case_when(value == -1 ~ -1 *output)) %>%
group_by(occupation) %>%
mutate(occupation_pos = sum(output[value == 1]),
occupation_neg = sum(output[value == -1])) %>%
select(-output)
# A tibble: 12 x 8
# Groups: occupation [3]
# occupation jobtask value weight outputpos outputneg occupation_pos occupation_neg
# <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 a 1 1 95 0.95 NA 1.76 0.65
# 2 a 2 1 81 0.81 NA 1.76 0.65
# 3 a 3 0 97 NA NA 1.76 0.65
# 4 a 4 -1 65 NA -0.65 1.76 0.65
# 5 b 5 -1 43 NA -0.43 0.43 1.32
# 6 b 6 0 92 NA NA 0.43 1.32
# 7 b 7 -1 89 NA -0.89 0.43 1.32
# 8 b 8 1 43 0.43 NA 0.43 1.32
# 9 c 9 -1 58 NA -0.580 0.99 0.580
#10 c 10 1 99 0.99 NA 0.99 0.580
#11 c 11 0 35 NA NA 0.99 0.580
#12 c 12 0 69 NA NA 0.99 0.580