A fellow researcher and I are trying to figure out a way to make our dataframe cleaner, and less cluttered. Here is a reprex:
> head(Dummy1)
# A tibble: 6 x 18
A0 A1 A2 A3 A4 A5 B0 B1 B2 B3 B4 B5 C0 C1 C2 C3 C4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
2 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
3 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0
4 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0
5 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
6 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
# … with 1 more variable: C5 <dbl>
>
Due to the way our software registered answers, we got A0 through A5, B0 through B5, etc instead of this:
> head(Dummy2)
# A tibble: 6 x 3
A B C
<dbl> <dbl> <dbl>
1 5 5 5
2 4 4 4
3 3 3 3
4 2 2 2
5 1 1 1
6 0 0 0
>
Is there a code that would allow us to transform the first version, each possible answer as a column with a binary 0 NO 1 YES into a single item column with the numeric result? The scale we are trying to analyze has well over 50 items, each ranging from 0 to 8.
Thank you for your help!
You can use split.default
to split all the same group columns in one dataframe. Use sapply
with max.col
to get the column number with the highest value in each row. I did -1
since your column numbers start with 0.
sapply(split.default(Dummy1, sub('\\d+', '', names(Dummy1))), max.col) - 1
sub('\\d+', '', names(Dummy1))
removes the number from column names so that they return "A" "A" "A" "A" "A" "A" "B" "B" "B" "B"......
which is used as a group to split on in split.default
.