Search code examples
csventhoughtcanopy

CSV reader issue in Enthought Canopy


I'm trying to read a csv file. The issue is that it is too large and I have had to use an error handler. Within the error handler, I have to call csv.field_size_limit(). Which does not work even by itself as I keep receiving a 'limit must be an integer' error. From further research, I have found that this is probably an install error. I've installed all third party tools using the Package Manager so I am not sure what could be going wrong. Any ideas about how to correct this issue?

import sys
import csv
maxInt = sys.maxsize
decrement = True
while decrement:
    decrement = False
    try:
        csv.field_size_limit(maxInt)
    except OverflowError:
        maxInt = int(maxInt/10)
        decrement = True
with open("Data.csv", 'rb') as textfile:
    text = csv.reader(textfile, delimiter=" ", quotechar='|')
    for line in text:
        print ' '.join(line)

Solution

  • Short answer: I am guessing that you are on 64-bit Windows. If so, then try using sys.maxint instead of sys.maxsize. Actually, you will probably still run into problems because I think that csv.field_size_limit() is going to try to preallocate memory of that size. You really want to estimate the actual field size that you need and maybe double it. Both sys.maxint and sys.maxsize are much too big for this.

    Long explanation: Python int objects store C long integers. On all relevant 32-bit platforms, both the size of a pointer or memory offset and C long integers are 32-bits. On most UNIXy 64-bit platforms, both the size of a pointer or memory offset and C long integers are 64-bits. However, 64-bits Windows decided to keep C long integers 32-bits while bumping up the pointer size to 64-bits. sys.maxint represents the biggest Python int (and thus C long) while sys.maxsize is the biggest memory offset. Consequently, on 64-bit Windows, sys.maxsize is a Python long integer because the Python int type cannot hold a number of that size. I suspect that csv.field_size_limit() actually requires a number that fits into a bona fide Python int object. That's why you get the OverflowError and the limit must be an integer errors.