I need your help with the following problem. My data is
ack det uniquenr
02mar2015 10:30:19 27feb2015 17:03:25 1
09mar2015 11:56:44 07mar2015 22:05:52 2
16mar2015 09:59:51 09mar2015 11:08:36 2
16mar2015 09:59:51 09mar2015 11:08:36 2
16mar2015 09:59:51 09mar2015 11:08:36 2
16mar2015 09:59:51 09mar2015 11:08:36 2
24feb2015 15:03:00 21feb2015 10:23:00 3
ack
and det
are datetimes. det
must be matched to the earliest possible following ack
with the same unique number. As appears in the example above, det
of 09mar2015 is wrong. I would like the result to be:
ack det uniquenr
02mar2015 10:30:19 27feb2015 17:03:25 1
09mar2015 11:56:44 07mar2015 22:05:52 2
09mar2015 11:56:44 09mar2015 11:08:36 2
09mar2015 11:56:44 09mar2015 11:08:36 2
09mar2015 11:56:44 09mar2015 11:08:36 2
09mar2015 11:56:44 09mar2015 11:08:36 2
24feb2015 15:03:00 21feb2015 10:23:00 3
I have tried the following code:
sort uniquenr det
bysort uniquenr (det): gen v1= (ack[_n-1]-det)/(1000*60*60)
gen v2= (ack-det)/(1000*60*60)
gen bedrenabo=1 if v1<v2
order bedrenabo v1 v2
replace bedrenabo=0 if v1==v2
replace bedrenabo=0 if v1<0
replace bedrenabo=0 if v1>v2
gen double newack1=ack[_n-1] if bedrenabo==1
format newack %tc
order Handling newack1 ack det tra
sort uniquenr det
replace ack=newack1 if newack1!=.
and my result is
ack det uniquenr newack
02mar2015 10:30:19 27feb2015 17:03:25 1
09mar2015 11:56:44 07mar2015 22:05:52 2
16mar2015 09:59:51 09mar2015 11:08:36 2 09mar2015 11:56:44
16mar2015 09:59:51 09mar2015 11:08:36 2
16mar2015 09:59:51 09mar2015 11:08:36 2
16mar2015 09:59:51 09mar2015 11:08:36 2
24feb2015 15:03:00 21feb2015 10:23:00 3
The problem seems to arise because there are duplicates in the data and I use [_n-1]
. The optimal solution would be to use a replace
command which for all observations with the same det
and uniquenr
replaces all missing values with the known newack
- if there is a newack
. The command bysort uniquenr det : replace nyack1 = newack1[_n-1] if missing(newack1)
gives different amounts of change for each time.
I am not yet familiar with foreach
loops, but if that's the way, I am open to it.
Ok, ok, assuming that I understand the question, here is one possible solution.
Please note, that I've modified you original example to make it more general (otherwise a simple egen
will do, but it seems to me that is not what you want).
// Input the data
// @Msh, generally it is helpful if you provide somthing like this with your Q.
clear
input str18 ackstr str18 detstr uniquenr
"02mar2015 10:30:19" "27feb2015 17:03:25" 1
"09mar2015 11:56:44" "07mar2015 22:05:52" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"24feb2015 15:03:00" "21feb2015 10:23:00" 3
end
gen double ack = clock(ackstr, "DMYhms")
gen double det = clock(detstr, "DMYhms")
format ack det %tc
drop *str
// I will modify the original data to make it more general
replace ack = tc(08mar2015 00:00:00) in 5
// Add row identifiers
gen id = _n
// Create all possible combinations of ack and det withing the same uniquenr
preserve
keep uniquenr ack
rename ack new=
tempfile ack
save `ack'
restore
joinby uniquenr using `ack'
// Drop ack that are smaller than det
drop if newack < det
// Match the smallest ack to a given det
ds newack id, not
collapse (min) newack (last) `r(varlist)', by(id)
You can also do it with a loop, e.g. in this manner:
// Add row numbers
gen id = _n
// Prepare newack
gen double newack = .
format newack %tc
// Ensure the data is sorted
sort uniquenr ack
// Compute block boundaries
gen row = _n
by uniquenr: gen min = row[1]
by uniquenr: gen max = row[_N]
// Compute new ack
forvalues i = 1/`c(N)' {
forvalues j = `=min[`i']'/`=max[`i']' {
if `=ack[`j']' >= `=det[`i']' {
replace newack = ack[`j'] in `i'
continue, break
}
}
}
// Revert the sorting
sort id
If you have a large dataset, it might make sense to rewrite the loop in mata, which should be faster.