Search code examples
roptimization

How to efficiently transform wide-format patient information data into time series of patient counts?


Data

Given the following data frame where each row represents information on a single patient

  • DateStart contains the date of admission to the hospital of the individual patient
  • DateEnd contains the day of hospital discharge of the individual patient
  • Length is just the difference of DateEnd and DateStart
DateStart DateEnd Length
2020-03-02 2020-04-10 39 days
2020-03-30 2020-04-09 10 days
2020-03-15 2020-03-28 13 days
2020-06-02 2020-06-20 18 days

and I basically want to transform this into a time series of counts of patients per day kinda like the sample below

Days Frequency
2020-03-02 1
2020-03-03 1
2020-03-04 1
2020-03-05 1
2020-03-06 1
2020-03-07 1
2020-03-08 1
2020-03-09 1
2020-03-10 1
2020-03-11 1

Problem The issue is that the one I made to solve this is kinda slow, CaseCountGenerator(), and I wonder if there is a more efficient approach.

my working code is as follows

library(tidyverse)
createDays=function(start,end){
  Days=seq.Date(from=start,to=end,by=1)
  return(Days)
}

CaseCountGenerator=function(Starts,Ends,Length,Days){
  Counts=rep(0,length(Days))
  
  for(j in 1:length(Starts)){
    
    for(l in 1:length(Days)){
      
      if(Days[l]==Starts[j]){
        for(m in ((1:Length[j])-1)){
          Counts[l+m]=Counts[l+m]+1
        }
        break
      }
      
      
    }
  }
  
  return(Counts)
}

a sample data

tempdf = structure(list(DateStart = structure(c(18323, 18351, 18336, 18415, 
18417, 18418, 18428, 18439, 18439, 18438, 18440, 18439, 18444, 
18451, 18444, 18452, 18444, 18445, 18450, 18452), tzone = "Asia/Manila", class = "Date"), 
DateEnd = structure(c(18362, 18361, 18349, 18433, 18470, 
18460, 18447, 18460, 18458, 18458, 18459, 18459, 18472, 18463, 
18464, 18464, 18458, 18464, 18472, 18470), tzone = "Asia/Manila", class = "Date"), 
Length = structure(c(39, 10, 13, 18, 53, 42, 19, 21, 19, 
20, 19, 20, 28, 12, 20, 12, 14, 19, 22, 18), class = "difftime", units = "days")), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

and how it all works

Days=createDays(min(tempdf$DateStart),
                max(tempdf$DateEnd))
  
Counts=CaseCountGenerator(tempdf$DateStart,
                          tempdf$DateEnd,
                          tempdf$Length,
                          Days)
  
result=tibble(Days,Counts)

Solution

  • Didn't test the speed but here is a base R option with outer + rowSums that you can try

    with(tempdf, {
      d <- seq(min(DateStart), max(DateEnd), by = "day")
      cnt <- rowSums(outer(d, DateStart, `>=`) & outer(d, DateEnd, `<=`))
      data.frame(date = d, counts = cnt)
    })
    

    which gives

              date counts
    1   2020-03-02      1
    2   2020-03-03      1
    3   2020-03-04      1
    4   2020-03-05      1
    5   2020-03-06      1
    6   2020-03-07      1
    7   2020-03-08      1
    8   2020-03-09      1
    9   2020-03-10      1
    10  2020-03-11      1
    11  2020-03-12      1
    12  2020-03-13      1
    13  2020-03-14      1
    14  2020-03-15      2
    15  2020-03-16      2
    16  2020-03-17      2
    17  2020-03-18      2
    18  2020-03-19      2
    19  2020-03-20      2
    20  2020-03-21      2
    21  2020-03-22      2
    22  2020-03-23      2
    23  2020-03-24      2
    24  2020-03-25      2
    25  2020-03-26      2
    26  2020-03-27      2
    27  2020-03-28      2
    28  2020-03-29      1
    29  2020-03-30      2
    30  2020-03-31      2
    31  2020-04-01      2
    32  2020-04-02      2
    33  2020-04-03      2
    34  2020-04-04      2
    35  2020-04-05      2
    36  2020-04-06      2
    37  2020-04-07      2
    38  2020-04-08      2
    39  2020-04-09      2
    40  2020-04-10      1
    41  2020-04-11      0
    42  2020-04-12      0
    43  2020-04-13      0
    44  2020-04-14      0
    45  2020-04-15      0
    46  2020-04-16      0
    47  2020-04-17      0
    48  2020-04-18      0
    49  2020-04-19      0
    50  2020-04-20      0
    51  2020-04-21      0
    52  2020-04-22      0
    53  2020-04-23      0
    54  2020-04-24      0
    55  2020-04-25      0
    56  2020-04-26      0
    57  2020-04-27      0
    58  2020-04-28      0
    59  2020-04-29      0
    60  2020-04-30      0
    61  2020-05-01      0
    62  2020-05-02      0
    63  2020-05-03      0
    64  2020-05-04      0
    65  2020-05-05      0
    66  2020-05-06      0
    67  2020-05-07      0
    68  2020-05-08      0
    69  2020-05-09      0
    70  2020-05-10      0
    71  2020-05-11      0
    72  2020-05-12      0
    73  2020-05-13      0
    74  2020-05-14      0
    75  2020-05-15      0
    76  2020-05-16      0
    77  2020-05-17      0
    78  2020-05-18      0
    79  2020-05-19      0
    80  2020-05-20      0
    81  2020-05-21      0
    82  2020-05-22      0
    83  2020-05-23      0
    84  2020-05-24      0
    85  2020-05-25      0
    86  2020-05-26      0
    87  2020-05-27      0
    88  2020-05-28      0
    89  2020-05-29      0
    90  2020-05-30      0
    91  2020-05-31      0
    92  2020-06-01      0
    93  2020-06-02      1
    94  2020-06-03      1
    95  2020-06-04      2
    96  2020-06-05      3
    97  2020-06-06      3
    98  2020-06-07      3
    99  2020-06-08      3
    100 2020-06-09      3
    101 2020-06-10      3
    102 2020-06-11      3
    103 2020-06-12      3
    104 2020-06-13      3
    105 2020-06-14      3
    106 2020-06-15      4
    107 2020-06-16      4
    108 2020-06-17      4
    109 2020-06-18      4
    110 2020-06-19      4
    111 2020-06-20      4
    112 2020-06-21      3
    113 2020-06-22      3
    114 2020-06-23      3
    115 2020-06-24      3
    116 2020-06-25      4
    117 2020-06-26      7
    118 2020-06-27      8
    119 2020-06-28      8
    120 2020-06-29      8
    121 2020-06-30      8
    122 2020-07-01     11
    123 2020-07-02     12
    124 2020-07-03     12
    125 2020-07-04     12
    126 2020-07-05     11
    127 2020-07-06     11
    128 2020-07-07     12
    129 2020-07-08     13
    130 2020-07-09     15
    131 2020-07-10     15
    132 2020-07-11     15
    133 2020-07-12     15
    134 2020-07-13     15
    135 2020-07-14     15
    136 2020-07-15     15
    137 2020-07-16     12
    138 2020-07-17     10
    139 2020-07-18      8
    140 2020-07-19      8
    141 2020-07-20      8
    142 2020-07-21      7
    143 2020-07-22      4
    144 2020-07-23      4
    145 2020-07-24      4
    146 2020-07-25      4
    147 2020-07-26      4
    148 2020-07-27      4
    149 2020-07-28      2
    150 2020-07-29      2