Search code examples
python-3.xnumpystringio

StringIO example does not work


I try to understand how works numpy.getfromtxt method and io.StringIO. On the officical website(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt) I found some examples. Here is one of them:

s = StringIO("1,1.3,abcde")
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),('mystring','S5')], delimiter=",")

But when I run this code on my computer I get: TypeError: must be str or None, not bytes

Tell me please how to fix it?


Solution

  • In [200]: np.__version__
    Out[200]: '1.14.0'
    

    The example works for me:

    In [201]: s = io.StringIO("1,1.3,abcde")
    In [202]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[202]: 
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    It also works for a byte string:

    In [204]: s = io.BytesIO(b"1,1.3,abcde")
    In [205]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[205]: 
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    genfromtxt works with anything that feeds it lines, so I usually use a list of bytestrings directly (when testing questions):

    In [206]: s = [b"1,1.3,abcde"]
    In [207]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[207]: 
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    Or with several lines

    In [208]: s = b"""1,1.3,abcde
         ...: 4,1.3,two""".splitlines()
    In [209]: s
    Out[209]: [b'1,1.3,abcde', b'4,1.3,two']
    In [210]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[210]: 
    array([(1, 1.3, b'abcde'), (4, 1.3, b'two')],
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    It used to be that with dtype=None, genfromtxt created S strings.

    NumPy dtype issues in genfromtxt(), reads string in as bytestring

    With 1.14, we can control the default string dtype:

    In [219]: s = io.StringIO("1,1.3,abcde")
    In [220]: np.genfromtxt(s, dtype=None, delimiter=",")
    /usr/local/bin/ipython3:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
      #!/usr/bin/python3
    Out[220]: 
    array((1, 1.3, b'abcde'),
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', 'S5')])
    In [221]: s = io.StringIO("1,1.3,abcde")
    In [222]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
    Out[222]: 
    array((1, 1.3, 'abcde'),
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])
    

    https://docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-functions

    Now I can generate examples with Py3 strings without producing all those ugly b'string' results (but got to remember that not everyone has upgraded to 1.14):

    In [223]: s = """1,1.3,abcde
         ...: 4,1.3,two""".splitlines()
    In [224]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
    Out[224]: 
    array([(1, 1.3, 'abcde'), (4, 1.3, 'two')],
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])