Search code examples
rdatasetdata-cleaninglogic

How can I write an R script to check for straight-lining; i.e., whether, for any given row, all values in a set of columns have the same value


I would like to create a dichotomous variable that tells me whether a participant gave the same response to each of 10 questions. Each row is a participant and I want to write a simple script to create this new variable/vector in my data frame. For example, if my data looks like the first 6 columns, then I'm trying to create the 7th one.

ID   Item1  Item2  Item3  Item4  Item5  | AllSame
1    5      5      5      5      5      | Yes
2    1      3      3      3      2      | No
3    2      2      2      2      2      | Yes
4    5      4      5      5      5      | No
5    5      2      3      5      5      | No

I've seen solutions on this set that compare one column to another, for example here with ifelse(data$item1==data$item2,1,ifelse(dat$item1==data$item3,0,NA)), but I have 10 columns in my actual dataset and I figure there's got to be a better way than checking all 10 against each other. I also could create a a variable that counts how many equal 1, and then do a test for if the count is the same as the number of columns, but with 7 possible responses in the data once again this is looking very unweildy and I'm hoping someone has a better solution. Thank you!


Solution

  • There will be many ways of doing this, but here is one

    mydf <- data.frame(Item1 = c(5,1,2,5,5), 
                       Item2 = c(5,3,2,4,2), 
                       Item3 = c(5,3,2,5,3), 
                       Item4 = c(5,3,2,5,5),
                       Item5 = c(5,3,2,5,5) )
    
    mydf$AllSame <- rowMeans(mydf[,1:5] == mydf[,1]) == 1
    

    which leads to

    > mydf
      Item1 Item2 Item3 Item4 Item5 AllSame
    1     5     5     5     5     5    TRUE
    2     1     3     3     3     3   FALSE
    3     2     2     2     2     2    TRUE
    4     5     4     5     5     5   FALSE
    5     5     2     3     5     5   FALSE
    

    And if you really must have "Yes" and "No" then use instead something like

    mydf$AllSame <- ifelse(rowMeans(mydf[,1:5] == mydf[,1]) == 1, "Yes", "No")