I'm trying to parse results from queries over HTTP that can return up to millions of lines - where each line need to be parsed. Ideally I would love to read a line at a time from a connection and parse it as I go - so basically a FileHandle-esque iterator, but the existing HTTP libraries all seem to fetch all content at once, although one can a) save to a file, or b) process chunks using a code ref. A is not ideal as it is a two-pass solution (the file would need to be read line by line after the data is transmitted, and it would take up storage, perhaps unnecessarily). B is not ideal as would like to be able to return each line, rather than handle it in a code ref, and moreover a chunk is not a line, so that LWP solution does not benefit from LWP line reconstitution. I know there are non-blocking solutions (using AnyEvent and Coro) but these seem more interested in non-blocking-ness rather than line-by-line processing. Can anyone point me in a good direction here, or am I barking up the wrong tree?
The callback lets you do anything that you want. You could make it so you buffer the input as you get it and read lines from the buffer. Perl lets you open filehandles on just about anything (using tie
), including strings (with open
). Anything else you might find is ultimately going to receive a chunk and turn it into lines anyway.