Search code examples
pythonstringio

creating stream to iterate over from string in Python


I want to create a stream from a string in Python so that it's equivalent to reading the string as if it's read from a text file. something like:

for line in open('myfile.txt'):
    print(line)

except the contents of 'myfile.txt' are stored in a string s. Is this the correct/best way to do it?

from io import StringIO

s = StringIO("a\t\b\nc\td\n")
for line in s:
    print(line)

Solution

  • I want to create a stream from a string in Python so that it's equivalent to reading the string as if it's read from a text file.

    Is this the correct/best way to do it?

    Yes, unless you really do want it in a list.

    If it is intended to be consumed line by line, the way you are doing it makes sense.

    StringIO() creates a file-like object.

    File objects have a method, .readlines(), which materialize the object as a list. Instead of materializing the data in a list, you can iterate over it, which is more memory light:

    # from StringIO import StringIO # Python 2 import
    from io import StringIO # Python 3 import
    
    txt = "foo\nbar\nbaz"
    

    Here we append each line into a list, so that we can demonstrate iterating over the file-like object and keeping a handle on the data. (More efficient would be list(file_like_io).

    m_1 = []
    file_like_io = StringIO(txt)
    for line in file_like_io:
        m_1.append(line)
    

    and now:

    >>> m_1
    ['foo\n', 'bar\n', 'baz']
    

    you can return your io to any index point with seek:

    >>> file_like_io.seek(0)
    >>> file_like_io.tell() # where we are in the object now
    0
    

    If you really want it in a list

    .readlines() materializes the StringIO iterator as if one did list(io) - this is considered less preferable.

    >>> m_2 = file_like_io.readlines() 
    

    And we can see that our results are the same:

    >>> m_1 == m_2
    True
    

    Keep in mind that it is splitting after the newlines, preserving them in the text as well, so you'll get two newlines for every printed line, double-spacing on print.