Search code examples
rstatisticspsych

R - one scale multiple column recode


A fellow researcher and I are trying to figure out a way to make our dataframe cleaner, and less cluttered. Here is a reprex:

> head(Dummy1)
# A tibble: 6 x 18
     A0    A1    A2    A3    A4    A5    B0    B1    B2    B3    B4    B5    C0    C1    C2    C3    C4
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0
2     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1
3     0     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0
4     0     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0
5     0     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0
6     1     0     0     0     0     0     1     0     0     0     0     0     1     0     0     0     0
# … with 1 more variable: C5 <dbl>
> 

Due to the way our software registered answers, we got A0 through A5, B0 through B5, etc instead of this:

> head(Dummy2)
# A tibble: 6 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1     5     5     5
2     4     4     4
3     3     3     3
4     2     2     2
5     1     1     1
6     0     0     0
> 

Is there a code that would allow us to transform the first version, each possible answer as a column with a binary 0 NO 1 YES into a single item column with the numeric result? The scale we are trying to analyze has well over 50 items, each ranging from 0 to 8.

Thank you for your help!


Solution

  • You can use split.default to split all the same group columns in one dataframe. Use sapply with max.col to get the column number with the highest value in each row. I did -1 since your column numbers start with 0.

    sapply(split.default(Dummy1, sub('\\d+', '', names(Dummy1))), max.col) - 1
    

    sub('\\d+', '', names(Dummy1)) removes the number from column names so that they return "A" "A" "A" "A" "A" "A" "B" "B" "B" "B"...... which is used as a group to split on in split.default.