Search code examples
rdataframecoalesce

How can I coalesce this dataset in R?


I am an ICU physician conducting research which involves taking lots of patient-related data from the ICU computer system (all ethically approved, etc). As is often the case, getting data out then requires cleaning and wrangling before it can be used properly.

I have obtained a set of data, and have wrangled it as best I can. Of course, my data science skills are pretty rudimentary and despite being an enthusiastic R user, I am at a complete blockage, and am hoping some of you might be able to shed some light on my problem and how to solve it. I absolutely cannot get round this, but suspect it is a commonly encountered issue in time-series work.

At present, my dataset now includes multiple rows for each time point. So, at time X there is an individual row for heart rate, blood pressure, etc. There are 46 observations, and this repeats for every time point (344 in total for this patient). All observations are not recorded at each time point. I have provided a link to screenshots of the way this data is arranged here.

A sample of the data is here, is that helps.

The best progress I've made is with the following nested for-loop structure. It works for the first set of observations. I have tried a strange while-loop arrangement that fell flat on its face.

# First, add a group to the entire table specifying each time point that 
# observations were conducted. 
Patient_full$Verification_group <- as.numeric(as.factor(Patient_full$Time))

# Get the number of these groups
observation_times <- max(Patient_full$Verification_group)

# Create the bare bones of an overall table. This is the first row of the table.
patient_obs_final <-  Patient_full[1,]

# Next I need to create a loop within loop. The master loop will coerce rows
# that have been created by the sub-loop.
for (i in 1 : observation_times) {
  
  # Isolate the overall observation group you are dealing with
  veri_group <- filter(Patient_full, Verification_group == i)

  # Start by getting some numbers to run the sub-loop
  lowest_obs_time_row <- min(veri_group$Row)
  highest_obs_time_row <- max(veri_group$Row)
  rows_in_obs_time <- (highest_obs_time_row - lowest_obs_time_row)
  
  # We can run the sub-loop now
  obs_at_timepoint <- Patient_full[lowest_obs_time_row, ]
  for (j in 1 : (rows_in_obs_time - 1)) {
    obs_at_timepoint <- coalesce(obs_at_timepoint, Patient_full[j + 1,])
    
  }
  
  patient_obs_final <-  rbind(patient_obs_final, obs_at_timepoint)
  
}

patient_obs_final

As soon as j goes to 2 the thing seems to fall apart.

So, in the end my goal is to have a separate row for each time point, and for that row to have whatever was recorded/observed at that time. I am at a loss, and can't even tell why my solution isn't working. Any advice would be greatly appreciated.


Solution

  • Don't do it in a loop. Only one summarise_at is enough !!

    library(tidyverse)
    Patient_01_sample <- read.csv("E:/R/StackOverflow/Patient_01_sample.xlsx - Sheet 1.csv", row.names=1)
    f = function(x) ifelse(length(x[!is.na(x)])==0,NA,x[!is.na(x)][1]) 
    Patient = read_csv("Patient_01_sample.xlsx - Sheet 1.csv")
    Patient %>% group_by(Time) %>% 
      summarise_at(vars(3:45), f)
    

    output

    # A tibble: 5 x 44
      Time                `Glucose ABG` `Lactate ABG` PaCO2  PaO2 `PH (ABG)` `Potassium ABG` `Sodium ABG` `Cardiac Rhythm` `Arterial Pressure Di~
      <dttm>                      <dbl>         <dbl> <dbl> <dbl>      <dbl>           <dbl>        <dbl> <chr>                             <dbl>
    1 2017-09-04 17:00:00          11.8          10.7  4.42  31.5       7.25             3.9          3.9 ST                                   NA
    2 2017-09-04 17:55:00          NA            NA   NA     NA        NA               NA           NA   NA                                   54
    3 2017-09-04 18:00:00          NA            NA   NA     NA        NA               NA           NA   ST                                   60
    4 2017-09-04 19:00:00          10.1           9.7  4.22  15         7.3              3.9          3.9 ST                                   58
    5 2017-09-04 20:00:00           8.1           7.2  5.07  16.9       7.27             3.9          3.9 ST                                   62
    # ... with 34 more variables: Arterial Pressure Mean <dbl>, Arterial Pressure Systolic <dbl>, Heart Rate <dbl>,
    #   Non Invasive Arterial Pressure Diastolic <dbl>, Non Invasive Arterial Pressure Mean <dbl>,
    #   Non Invasive Arterial Pressure Systolic <dbl>, Tympanic Temperature <dbl>, Patient Positioning ABG <lgl>, Central Venous Pressure <lgl>,
    #   Delivered Percent O2 <lgl>, Mean Airway Pressure S <lgl>, Minute Volume expired (S) <lgl>, Peak Inspiratory Pressure measured  S <lgl>,
    #   Positive End Expiratory pressure <lgl>, S Expired Tidal vol. (breath) <lgl>, S Tidal Volume Inspired <lgl>, Servo i Modes <lgl>,
    #   Set FiO2 <lgl>, Set Flow Trigger S <lgl>, Set Pause time % <lgl>, Set PEEP Servo <lgl>, Set rate (CMV or SIMV) <lgl>,
    #   Set Tidal Volume (servo) <lgl>, Set Upper Pressure Limit <lgl>, Spontaneous Rate  S <lgl>, Cardiac output (Vigileo) <lgl>, ...