Data
Given the following data frame where each row represents information on a single patient
DateStart
contains the date of admission to the hospital of the individual patientDateEnd
contains the day of hospital discharge of the individual patientLength
is just the difference of DateEnd
and DateStart
DateStart | DateEnd | Length |
---|---|---|
2020-03-02 | 2020-04-10 | 39 days |
2020-03-30 | 2020-04-09 | 10 days |
2020-03-15 | 2020-03-28 | 13 days |
2020-06-02 | 2020-06-20 | 18 days |
and I basically want to transform this into a time series of counts of patients per day kinda like the sample below
Days | Frequency |
---|---|
2020-03-02 | 1 |
2020-03-03 | 1 |
2020-03-04 | 1 |
2020-03-05 | 1 |
2020-03-06 | 1 |
2020-03-07 | 1 |
2020-03-08 | 1 |
2020-03-09 | 1 |
2020-03-10 | 1 |
2020-03-11 | 1 |
Problem
The issue is that the one I made to solve this is kinda slow, CaseCountGenerator()
, and I wonder if there is a more efficient approach.
my working code is as follows
library(tidyverse)
createDays=function(start,end){
Days=seq.Date(from=start,to=end,by=1)
return(Days)
}
CaseCountGenerator=function(Starts,Ends,Length,Days){
Counts=rep(0,length(Days))
for(j in 1:length(Starts)){
for(l in 1:length(Days)){
if(Days[l]==Starts[j]){
for(m in ((1:Length[j])-1)){
Counts[l+m]=Counts[l+m]+1
}
break
}
}
}
return(Counts)
}
a sample data
tempdf = structure(list(DateStart = structure(c(18323, 18351, 18336, 18415,
18417, 18418, 18428, 18439, 18439, 18438, 18440, 18439, 18444,
18451, 18444, 18452, 18444, 18445, 18450, 18452), tzone = "Asia/Manila", class = "Date"),
DateEnd = structure(c(18362, 18361, 18349, 18433, 18470,
18460, 18447, 18460, 18458, 18458, 18459, 18459, 18472, 18463,
18464, 18464, 18458, 18464, 18472, 18470), tzone = "Asia/Manila", class = "Date"),
Length = structure(c(39, 10, 13, 18, 53, 42, 19, 21, 19,
20, 19, 20, 28, 12, 20, 12, 14, 19, 22, 18), class = "difftime", units = "days")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
and how it all works
Days=createDays(min(tempdf$DateStart),
max(tempdf$DateEnd))
Counts=CaseCountGenerator(tempdf$DateStart,
tempdf$DateEnd,
tempdf$Length,
Days)
result=tibble(Days,Counts)
Didn't test the speed but here is a base R option with outer
+ rowSums
that you can try
with(tempdf, {
d <- seq(min(DateStart), max(DateEnd), by = "day")
cnt <- rowSums(outer(d, DateStart, `>=`) & outer(d, DateEnd, `<=`))
data.frame(date = d, counts = cnt)
})
which gives
date counts
1 2020-03-02 1
2 2020-03-03 1
3 2020-03-04 1
4 2020-03-05 1
5 2020-03-06 1
6 2020-03-07 1
7 2020-03-08 1
8 2020-03-09 1
9 2020-03-10 1
10 2020-03-11 1
11 2020-03-12 1
12 2020-03-13 1
13 2020-03-14 1
14 2020-03-15 2
15 2020-03-16 2
16 2020-03-17 2
17 2020-03-18 2
18 2020-03-19 2
19 2020-03-20 2
20 2020-03-21 2
21 2020-03-22 2
22 2020-03-23 2
23 2020-03-24 2
24 2020-03-25 2
25 2020-03-26 2
26 2020-03-27 2
27 2020-03-28 2
28 2020-03-29 1
29 2020-03-30 2
30 2020-03-31 2
31 2020-04-01 2
32 2020-04-02 2
33 2020-04-03 2
34 2020-04-04 2
35 2020-04-05 2
36 2020-04-06 2
37 2020-04-07 2
38 2020-04-08 2
39 2020-04-09 2
40 2020-04-10 1
41 2020-04-11 0
42 2020-04-12 0
43 2020-04-13 0
44 2020-04-14 0
45 2020-04-15 0
46 2020-04-16 0
47 2020-04-17 0
48 2020-04-18 0
49 2020-04-19 0
50 2020-04-20 0
51 2020-04-21 0
52 2020-04-22 0
53 2020-04-23 0
54 2020-04-24 0
55 2020-04-25 0
56 2020-04-26 0
57 2020-04-27 0
58 2020-04-28 0
59 2020-04-29 0
60 2020-04-30 0
61 2020-05-01 0
62 2020-05-02 0
63 2020-05-03 0
64 2020-05-04 0
65 2020-05-05 0
66 2020-05-06 0
67 2020-05-07 0
68 2020-05-08 0
69 2020-05-09 0
70 2020-05-10 0
71 2020-05-11 0
72 2020-05-12 0
73 2020-05-13 0
74 2020-05-14 0
75 2020-05-15 0
76 2020-05-16 0
77 2020-05-17 0
78 2020-05-18 0
79 2020-05-19 0
80 2020-05-20 0
81 2020-05-21 0
82 2020-05-22 0
83 2020-05-23 0
84 2020-05-24 0
85 2020-05-25 0
86 2020-05-26 0
87 2020-05-27 0
88 2020-05-28 0
89 2020-05-29 0
90 2020-05-30 0
91 2020-05-31 0
92 2020-06-01 0
93 2020-06-02 1
94 2020-06-03 1
95 2020-06-04 2
96 2020-06-05 3
97 2020-06-06 3
98 2020-06-07 3
99 2020-06-08 3
100 2020-06-09 3
101 2020-06-10 3
102 2020-06-11 3
103 2020-06-12 3
104 2020-06-13 3
105 2020-06-14 3
106 2020-06-15 4
107 2020-06-16 4
108 2020-06-17 4
109 2020-06-18 4
110 2020-06-19 4
111 2020-06-20 4
112 2020-06-21 3
113 2020-06-22 3
114 2020-06-23 3
115 2020-06-24 3
116 2020-06-25 4
117 2020-06-26 7
118 2020-06-27 8
119 2020-06-28 8
120 2020-06-29 8
121 2020-06-30 8
122 2020-07-01 11
123 2020-07-02 12
124 2020-07-03 12
125 2020-07-04 12
126 2020-07-05 11
127 2020-07-06 11
128 2020-07-07 12
129 2020-07-08 13
130 2020-07-09 15
131 2020-07-10 15
132 2020-07-11 15
133 2020-07-12 15
134 2020-07-13 15
135 2020-07-14 15
136 2020-07-15 15
137 2020-07-16 12
138 2020-07-17 10
139 2020-07-18 8
140 2020-07-19 8
141 2020-07-20 8
142 2020-07-21 7
143 2020-07-22 4
144 2020-07-23 4
145 2020-07-24 4
146 2020-07-25 4
147 2020-07-26 4
148 2020-07-27 4
149 2020-07-28 2
150 2020-07-29 2