Search code examples
pythonarraysnumpytypeerrorgenfromtxt

np.genfromtxt multiple delimiters?


My file looks like this:

1497484825;34425;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
1497484837;34476;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14

I want to import it into numpy array using np.genfromtxt. The biggest problem is that it has ';' and ',' as delimiters. My try:

import numpy as np
import io

s = io.StringIO(open('2e70dfa1.csv').read().replace(';',','))

data = np.genfromtxt(s,dtype=int,delimiter=',')

I get error:

TypeError: Can't convert 'bytes' object to str implicitly

How to solve it? I'm also open to completely new (better) ideas.


Solution

  • According to the docs:

    Parameters:
    fname : file, str, pathlib.Path, list of str, generator File, filename, list, or generator to read. If the filename extension is gz or bz2, the file is first decompressed. Note that generators must return byte strings in Python 3k. The strings in a list or produced by a generator are treated as lines.

    Probably easier and more efficient to give it a generator, just bearing in mind it must yield byte-strings:

    >>> with open('2e70dfa1.csv', 'rb') as f:
    ...     clean_lines = (line.replace(b';',b',') for line in f)
    ...     data = np.genfromtxt(clean_lines, dtype=int, delimiter=',')
    ...
    >>> data
    array([[1497484825,      34425,         -4,         28,        -14,
                    -4,         28,        -14,         -4,         28,
                   -14,         -4,         28,        -14,         -4,
                    28,        -14,         -4,         28,        -14],
           [1497484837,      34476,         -4,         28,        -14,
                    -4,         28,        -14,         -4,         28,
                   -14,         -4,         28,        -14,         -4,
                    28,        -14,         -4,         28,        -14]])