Processing textfile in chunks, where each chunk is separated by timestamp

I'm trying to parse iostat -xt output using Python. The quirk with iostat is that the output for each second runs over multiple lines. For example:

06/30/2015 03:09:17 PM 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
           0.03    0.00    0.03    0.00    0.00   99.94 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
xvdap1            0.00     0.04    0.02    0.07     0.30     3.28    81.37     0.00   29.83    2.74   38.30   0.47   0.00 
xvdb              0.00     0.00    0.00    0.00     0.00     0.00    11.62     0.00    0.23    0.19    2.13   0.16   0.00 
xvdf              0.00     0.00    0.00    0.00     0.00     0.00    10.29     0.00    0.41    0.41    0.73   0.38   0.00 
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     9.12     0.00    0.36    0.35    1.20   0.34   0.00 
xvdh              0.00     0.00    0.00    0.00     0.00     0.00    33.35     0.00    1.39    0.41    8.91   0.39   0.00 
dm-0              0.00     0.00    0.00    0.00     0.00     0.00    11.66     0.00    0.46    0.46    0.00   0.37   0.00 

06/30/2015 03:09:18 PM 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
           0.00    0.00    0.50    0.00    0.00   99.50 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 

06/30/2015 03:09:19 PM 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
           0.00    0.00    0.50    0.00    0.00   99.50 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00 
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Essentially I need to parse the output in "chunks", where each chunk is separated by a timestamp.

I was looking at itertools.groupby(), but that doesn't seem to quite do what I want here - it seems more for grouping lines, where each is united by a common key, or something that you can use a function to check for.

Another thought was something like:

for line in f: 
    if line.count("/") == 2 and line.count(":") == 2: 
        current_time = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S') 
    while line.count("/") != 2 and line.count(":") != 2: 
        print(line) 
        continue

But that didn't quite seem to work.

Is there a Pythonic way of parsing the above iostat output, and break it into chunks split by the timestamp?

Solution

You can use regex:

import re
date_reg = "([0-9]{2}\/[0-9]{2}\/[0-9]{4} [0-9]{2}\:[0-9]{2}\:[0-9]{2} (?:AM|PM))"
def split_by_date(text_iter):
    date = None
    lines = []
    for line in text_iter:
        if re.match(date_reg, line):
            if lines or date:
                yield (date, lines)
            date = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S')
            lines = []
        else:
            lines.append(line)

    yield (date, lines)

for date, lines in split_by_date(f):
    # here you have lines for each encountered date
    for line in lines:
        print line