after a long time I am working again with R and thus I am a bit rusty. Every bit of help is appreciated.
So I have a survey out there which contains many matrix questions, even dual matrix. In this example it is a dual matirx is the survey question come formatted in Question.AnswerCode..FirstOrSecondMatrix. I made a frequency data frame who looks like this:
subsetting from my original data for reproduction - columns in german, not sorted
dput(PIL)
structure(list(gering = c(12L, 9L, 12L, 3L, 12L, 8L, 3L, 10L,
12L, 6L, 7L, 7L), hoch = c(32L, 45L, 28L, 43L, 39L, 44L, 20L,
20L, 17L, 20L, 28L, 18L), `keine Angabe` = c(6L, 20L, 12L, 30L,
9L, 24L, 9L, 16L, 10L, 17L, 10L, 17L), mittel = c(27L, 32L, 15L,
27L, 23L, 27L, 19L, 18L, 20L, 25L, 10L, 24L), `sehr gering` = c(17L,
10L, 28L, 1L, 14L, 2L, 1L, 1L, 3L, 2L, 15L, 4L), `sehr hoch` = c(22,
0, 21, 12, 19, 11, 18, 5, 8, 0, 0, 0), Summe = c(116, 116, 116,
116, 116, 116, 70, 70, 70, 70, 70, 70)), row.names = c("PIL.PILK1..1.",
"PIL.PILK1..2.", "PIL.PILK2..1.", "PIL.PILK2..2.", "PIL.PILK3..1.",
"PIL.PILK3..2.", "PIL2.PILK1..1.", "PIL2.PILK1..2.", "PIL2.PILK2..1.",
"PIL2.PILK2..2.", "PIL2.PILK3..1.", "PIL2.PILK3..2."), class = "data.frame")
and as I result I try to achieve this:
I would like to join(sum) the rows by the middle part of the string "PILK1" = Prozess1 and rename it into "Prozess 1..1." or Prozess1..2.". Then renaming the last step "..1." with grepl to the desired output i am pretty sure to get this done since it has only two options. But all my searching has not helped me with the renaming a part of a string from a second data frame with the partial strings and the name of the process. The names list(data frame) contains 100+ names as abbrev strings and full names.
Where I am stuck
Thanks a lot
PS: My workaround up to now is directly addressing it i.e. Prozess1_Intensity<- table(ifelse(!is.na(PIL.PILK1..1.),PIL.PILK1..1., PIL2.PILK1..1.), useNA = "always")
and bind all items into a data frame. But this is a lot of copy and paste or in my case chaining everything together in Excel and paste it into R, which becomes tedious if applied to the whole survey
If the lookup table to match the process name with it's abbrevation is called as lookup
.
library(dplyr)
library(tidyr)
lookup <- data.frame(prozess_name = c('Prozess1', 'Prozess2', 'Prozess3'),
abbrev = c('PILK1', 'PILK2', 'PILK3'))
lookup
# prozess_name abbrev
#1 Prozess1 PILK1
#2 Prozess2 PILK2
#3 Prozess3 PILK3
You can divide the rownames in 3 separate columns using extract
, join with lookup and sum the column values using across
.
PIL %>%
rownames_to_column('abbrev') %>%
extract(abbrev, c('PIL','abbrev', 'num'), '(PIL\\d?)\\.(PILK\\d+)\\.\\.(\\d+)\\.') %>%
left_join(lookup, by = 'abbrev') %>%
group_by(prozess_name, num) %>%
summarise(across(gering:Summe, sum, na.rm = TRUE)) %>%
ungroup
# prozess_name num gering hoch `keine Angabe` mittel `sehr gering` `sehr hoch` Summe
# <chr> <chr> <int> <int> <int> <int> <int> <dbl> <dbl>
#1 Prozess1 1 15 52 15 46 18 40 186
#2 Prozess1 2 19 65 36 50 11 5 186
#3 Prozess2 1 24 45 22 35 31 29 186
#4 Prozess2 2 9 63 47 52 3 12 186
#5 Prozess3 1 19 67 19 33 29 19 186
#6 Prozess3 2 15 62 41 51 6 11 186