Search code examples
rstatisticsregressionsubsetlinear-regression

How to subset the data based on available rows of a variable within a specific level of another variable


Let`s say I have the following dataframe:

ID <- c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
X1<-c(0.1,0.3,NA,2.2,0,NA,0.1,NA,1.4,2.3,0,NA,NA,0.3,2.8,2.3,0,NA)
X2<-c(0.8,NA,1.2,0.3,NA,NA,0.8,NA,1.5,NA,2.2,NA,0.8,NA,1.7,0.3,1.1,2.4)
X3<-c(1.1,0.2,0.4,0.8,NA,0.6,1.1,3.2,2.4,0.8,NA,NA,1.1,0.2,0.4,0.8,NA,0.6)
Time<-c(baseline,week1,week2,week3,week4,week5,baseline,week1,week2,week3,week4,week5,baseline,week1,week2,week3,week4,week5)
data<-data.frame(ID,X1,X2,X3,Time)

Now, X1 is the predictor and X2-X3 are my outcomes.

What I want is to assess the relationship between X2 at week5 and X1 at week3 and also between the X3 at week 5 and X1 at week3. (Like a regular linear regression and Pearson correlation test.)

The reason for subsetting (or subsetting as a solution to me) is that all patients are combined but only those with week5 assessment for outcomes will be included.

Does anyone know of any code that could do this?


Solution

  • You could pivot the data wide and then calculate the cor that you want:

    library(tidyverse)
    
    wide_data <- data |>
      pivot_wider(names_from = Time, values_from = c(X1, X2, X3))
    
    with(wide_data, cor(X1_week3, X2_week5)) 
    with(wide_data, cor(X1_week3, X3_week5))