Search code examples
pythonstringio

Bug in StringIO module python using numpy


Very simple code:

import StringIO
import numpy as np
c = StringIO.StringIO()
c.write("1 0")
a = np.loadtxt(c)
print a

I get an empty array + warning that c is an empty file.

I fixed this by adding:

d=StringIO.StringIO(c.getvalue())
a = np.loadtxt(d)

I think such a thing shouldn't happen, what is happening here?


Solution

  • StringIO is a file-like object. As such it has behaviors consistent with a file. There is a notion of a file pointer - the current position within the file. When you write data to a StringIO object the file pointer is adjusted to the end of the data. When you try to read it, the file pointer is already at the end of the buffer, so no data is returned.

    To read it back you can do one of two things:

    • Use StringIO.getvalue() as you already discovered. This returns the data from the beginning of the buffer, leaving the file pointer unchanged.
    • Use StringIO.seek(0) to reposition the file pointer to the start of the buffer and then calling StringIO.read() to read the data.

    Demo

    >>> from StringIO import StringIO
    
    >>> s = StringIO()
    >>> s.write('hi there')
    >>> s.read()
    ''
    >>> s.tell()    # shows the current position of the file pointer
    8
    >>> s.getvalue()
    'hi there'
    >>> s.tell()
    8
    >>> s.read()
    ''
    >>> s.seek(0)
    >>> s.tell()
    0
    >>> s.read()
    'hi there'
    >>> s.tell()
    8
    >>> s.read()
    ''
    

    There is one exception to this. If you provide a value at the time that you create the StringIO the buffer will be initialised with the value, but the file pointer will positioned at the start of the buffer:

    >>> s = StringIO('hi there')
    >>> s.tell()
    0
    >>> s.read()
    'hi there'
    >>> s.read()
    ''
    >>> s.tell()
    8
    

    And that is why it works when you use

    d=StringIO.StringIO(c.getvalue())
    

    because you are initialising the StringIO object at creation time, and the file pointer is positioned at the beginning of the buffer.