Search code examples
stata

Range of times strings in Stata


I'm working with a Stata dataset that has periods of time saved in a rather odd way, in a string with the word "to" as a an indication of the range of time, with markers for the twelve hour clock, for example, "20march2020 1 p.m. to 3 p.m." I was wondering what the best way of parsing/using this information is, particularly with respect to datetime. I have read through the datetime documentation, and while it's useful for specific times of day, it is not particularly helpful when it comes to ranges of times.

I was considering separating the string into two strings, with the start and end of the range of times, e.g. "20march2020 1 p.m." and "20march2020 3 p.m.", but I was curious if there was a more direct solution to make this data workable. The main concern I have about my approach is automating a change of date if the time interval crosses midnight, e.g. "20march2020 11 p.m. to 1 a.m.". Any suggestions would be sincerely appreciated.

Here is some sample data:

input str28 times

"17may2020 1 p.m. to 10 p.m."
"17may2020 10 p.m. to 5 a.m." 
"18may2020 5 a.m. to noon"
"18may2020 noon to 7 p.m."
"18may2020 7 p.m. to 1 a.m."

Solution

  • clear
    input str28 times
    "17may2020 1 p.m. to 10 p.m."
    "17may2020 10 p.m. to 5 a.m." 
    "18may2020 5 a.m. to noon"
    "18may2020 noon to 7 p.m."
    "18may2020 7 p.m. to 1 a.m."
    end
    
    // Noon won't be recognized by clock(), so replace with 12 p.m.
    replace times = subinstr(times, "noon", "12 p.m.", .)
    
    // Split times in two variables
    gen times_only = substr(times, 11, .)
    split times_only , parse("to")
    
    // Generate datetime variables
    gen double datetime1 = clock(substr(times,1,10) + times_only1, "DMYh")
    gen double datetime2 = clock(substr(times,1,10) + times_only2, "DMYh")
    format datetime1 datetime2 %tc
    
    // If datetime2 is before datetime1, add one day (86400000 milliseconds)
    replace datetime2 = datetime2 + 86400000 if datetime2 < datetime1
    
    // Drop auxiliary variables
    drop times_only*
    
    // Admire the results
    list
    
         +-----------------------------------------------------------------------+
         |                       times            datetime1            datetime2 |
         |-----------------------------------------------------------------------|
      1. | 17may2020 1 p.m. to 10 p.m.   17may2020 13:00:00   17may2020 22:00:00 |
      2. | 17may2020 10 p.m. to 5 a.m.   17may2020 22:00:00   18may2020 05:00:00 |
      3. | 18may2020 5 a.m. to 12 p.m.   18may2020 05:00:00   18may2020 12:00:00 |
      4. | 18may2020 12 p.m. to 7 p.m.   18may2020 12:00:00   18may2020 19:00:00 |
      5. | 18may2020 7 p.m. to 12 a.m.   18may2020 19:00:00   19may2020 00:00:00 |
         +-----------------------------------------------------------------------+