Search code examples
rdataframeindexingsetrow

Index and assign multiple sets of rows at once


I have an imported dataframe Measurements that contains many observations from an experiment.

Measurements <- data.frame(X = 1:4,
           Data = c(90, 85, 100, 105))
X   Data
1   90          
2   85          
3   100         
4   105 

I want to add another column Condition that specifies the treatment group for each datapoint. I know which obervation ranges are from which condition (e.g. observations 1:2 are from the control and observations 3:4 are from the experimental group).

I have devised two solutions already that give the desired output but neither are ideal. First:

Measurements["Condition"] <- c(rep("Cont", 2), rep("Exp", 2))
X   Data Condition      
1   90   Cont       
2   85   Cont       
3   100  Exp        
4   105  Exp

The benefit of this is it is one line of code/one command. But this is not ideal since I need to do math outside separately (e.g. 3:4 = 2 obs, etc) which can be tricky/unclear/indirect with larger datasets and more conditions (e.g. 47:83 = ? obs, etc) and would be liable to perpetuating errors since a small error in length for an early assignment would also shift the assignment of later groups (e.g. if rep of Cont is mistakenly 1, then Exp gets mistakenly assigned to 2:3 too).

I also thought of assigning like this, which gives the desired output too:

Measurements[1:2, "Condition"] <- "Cont"
Measurements[3:4, "Condition"] <- "Exp"
X   Data Condition      
1   90   Cont       
2   85   Cont       
3   100  Exp        
4   105  Exp

This makes it more clear/simple/direct which rows will receive which assignment, but this requires separate assignments and repetition. I feel like there should be a way to "vectorize" this assignment, which is the solution I'm looking for.

I'm having trouble finding complex indexing rules from online. Here is my first intuitive guess of how to achieve this:

Measurements[c(1:2, 3:4), "Condition"] <- list("Cont", "Exp")

X   Data Condition
1   90   Cont
2   85   Cont       
3   100  Cont       
4   105  Cont

But this doesn't work. It seems to combine 1:2 and 3:4 into a single equivalent range (1:4) and assigns only the first condition to this range, which suggests I also need to specify the column again. When I try to specify the column again:

Measurements[c(1:2, 3:4), c("Condition", "Condition")] <- list("Cont", "Exp")
X   Data Condition Condition.1    
1   90   Cont      Exp  
2   85   Cont      Exp  
3   100  Cont      Exp  
4   105  Cont      Exp

For some reason this creates a second new column (??), and it again seems to combine 1:2 and 3:4 into essentially 1:4. So I think I need to index the two row ranges in a way that keeps them separate and only specify the column once, but I'm stuck on how to do this. I assume the solution is simple but I can't seem to find an example of what I'm trying to do. Maybe to keep them separate I do have to assign them separately, but I'm hoping there is a way.

Can anyone help? Thank you a ton in advance from an R noobie!


Solution

  • If you already have a list of observations which belong to each condition you could use dplyr::case_when to do a conditional mutate. Depending on how you have this information stored you could use something like the following:

    library(dplyr)
    
    Measurements <- data.frame(X = 1:4,
                               Data = c(90, 85, 100, 105))
    
    # set which observations belong to each condition
    Cont <- 1:2
    Exp <- 3:4
    
    Measurements %>%
      mutate(Condition = case_when(
        X %in% Cont ~ "Cont",
        X %in% Exp ~ "Exp"
      ))
    
    # X Data Condition
    # 1   90      Cont
    # 2   85      Cont
    # 3  100       Exp
    # 4  105       Exp
    

    Note that this does not require the observations to be in consecutive rows.