I want to parse logfiles from rackspace. I'm using the official python sdk.
I have previously saved the file to disk and then read it from there with gzip.open
.
Now I'm on heroku and can't / don't want to save the file to disk, but do the unzipping in memory.
However, I can't manage to download the object as string or pseudo file object to handle it.
Does someone has an idea?
logString = ''
buffer = logfile.stream()
while True:
try:
logString += buffer.next()
except StopIteration:
break
# logString is always empty here
# I'd like to have something that enables me to do this:
for line in zlib.decompress(logString):
# having each line of the log here
Update
I've noticed, that the empty string is not entirely true. This is going through a loop, and just the first occurence is empty. The next occurences I do have data (that looks like it's gzipped), but I get this zlib error:
zlib.error: Error -3 while decompressing data: incorrect header check
Update II
As suggested, I implemented cStringIO, with the same result:
buffer = logfile.stream()
output = cStringIO.StringIO()
while True:
try:
output.write(buffer.next())
except StopIteration:
break
print(output.getvalue())
Update III This does work now:
output = cStringIO.StringIO()
try:
for buffer in logfile.stream():
output.write(buffer)
except StopIteration:
break
And at least no crash in here, but it seems not to get actual lines:
for line in gzip.GzipFile(fileobj=output).readlines():
# this is never reached
How to proceed here? Is there some easy way to see the incoming data as normal string to know if I'm on the right way?
I found out, that read()
is also an option, that led to an easy solution like this:
io = cStringIO.StringIO(logfile.read())
for line in GzipFile(fileobj=io).readlines():
impression = LogParser._parseLine(line)
if impression is not None:
impressions.append(impression)