I'm trying to parse iostat -xt output using Python. The quirk with iostat is that the output for each second runs over multiple lines. For example:
06/30/2015 03:09:17 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.03 0.00 0.03 0.00 0.00 99.94
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.04 0.02 0.07 0.30 3.28 81.37 0.00 29.83 2.74 38.30 0.47 0.00
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 11.62 0.00 0.23 0.19 2.13 0.16 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 10.29 0.00 0.41 0.41 0.73 0.38 0.00
xvdg 0.00 0.00 0.00 0.00 0.00 0.00 9.12 0.00 0.36 0.35 1.20 0.34 0.00
xvdh 0.00 0.00 0.00 0.00 0.00 0.00 33.35 0.00 1.39 0.41 8.91 0.39 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 11.66 0.00 0.46 0.46 0.00 0.37 0.00
06/30/2015 03:09:18 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 0.00 0.00 99.50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
06/30/2015 03:09:19 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 0.00 0.00 99.50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Essentially I need to parse the output in "chunks", where each chunk is separated by a timestamp.
I was looking at itertools.groupby(), but that doesn't seem to quite do what I want here - it seems more for grouping lines, where each is united by a common key, or something that you can use a function to check for.
Another thought was something like:
for line in f:
if line.count("/") == 2 and line.count(":") == 2:
current_time = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S')
while line.count("/") != 2 and line.count(":") != 2:
print(line)
continue
But that didn't quite seem to work.
Is there a Pythonic way of parsing the above iostat output, and break it into chunks split by the timestamp?
You can use regex:
import re
date_reg = "([0-9]{2}\/[0-9]{2}\/[0-9]{4} [0-9]{2}\:[0-9]{2}\:[0-9]{2} (?:AM|PM))"
def split_by_date(text_iter):
date = None
lines = []
for line in text_iter:
if re.match(date_reg, line):
if lines or date:
yield (date, lines)
date = datetime.strptime(line.strip(), '%m/%d/%y %H:%M:%S')
lines = []
else:
lines.append(line)
yield (date, lines)
for date, lines in split_by_date(f):
# here you have lines for each encountered date
for line in lines:
print line