Search code examples
pythonnumpydatasetwhitespacedelimiter

loading a dataset in python (numpy) when there are variable spaces delimiting columns


I have a big dataset contains numeric data and in some of its rows there are variable spaces delimiting columns, like:

4 5 6
7  8    9
2 3 4

When I use this line:

dataset=numpy.loadtxt("dataset.txt", delimiter=" ")

I get this error:

ValueError: Wrong number of columns at line 2

How can I change the code to ignore multiple spaces as well?


Solution

  • The default for delimiter is 'any whitespace'. If you leave loadtxt out, it copes with multiple spaces.

    >>> from io import StringIO
    >>> dataset = StringIO('''\
    ... 4 5 6
    ... 7 8     9
    ... 2 3 4''')
    >>> import numpy
    >>> dataset_as_numpy = numpy.loadtxt(dataset)
    >>> dataset_as_numpy
    array([[ 4.,  5.,  6.],
           [ 7.,  8.,  9.],
           [ 2.,  3.,  4.]])