This to understand things better. It is not an actual problem that I need to fix. A cstringIO
object is supposed to emulate a string, file and also an iterator over the lines. Does it also emulate a buffer ? In anycase ideally one should be able to construct a numpy array as follows
import numpy as np
import cstringIO
c = cStringIO.StringIO('\x01\x00\x00\x00\x01\x00\x00\x00')
#Trying the iterartor abstraction
b = np.fromiter(c,int)
# The above fails with: ValueError: setting an array element with a sequence.
#Trying the file abstraction
b = np.fromfile(c,int)
# The above fails with: IOError: first argument must be an open file
#Trying the sequence abstraction
b = np.array(c, int)
# The above fails with: TypeError: long() argument must be a string or a number
#Trying the string abstraction
b = np.fromstring(c)
#The above fails with: TypeError: argument 1 must be string or read-only buffer
b = np.fromstring(c.getvalue(), int) # does work
My question is why does it behave this way.
The practical problem where this came up is the following: I have a iterator which yields a tuple. I am interested in making a numpy array from one of the components of the tuple with as little copying and duplication as possible. My first cut was to keep writing the interesting components of the yielded tuple into a StringIO object and then use its memory buffer for the array. I can of course use getvalue()
but will create and return a copy. What would be a good way to avoid the extra copying.
The problem seems to be that numpy doesn't like being given characters instead of numbers. Remember, in Python, single characters and strings have the same type — numpy must have some type detection going on under the hood, and takes '\x01'
to be a nested sequence.
The other problem is that a cStringIO
iterates over its lines, not its characters.
Something like the following iterator should get around both of these problems:
def chariter(filelike):
octet = filelike.read(1)
while octet:
yield ord(octet)
octet = filelike.read(1)
Use it like so (note the seek!):
c.seek(0)
b = np.fromiter(chariter(c), int)