Search code examples
pythongenfromtxt

Using genfromtxt to read in data with different column lengths


Have a following data set that has different column lengths:

5.0     0.4     0.92    11.45   44.18   33.66
3.2     4.92    7.2     11.73   46.98   118.63
3.6     11.43   14.32   8.88    71.3
1.99    9.12    11.71   15.56   20.24   
0.77    21.92   2.47    33.99   80.68
0.91    4.32    14.6    15.69   127.8
2.67    2.1     5.14    7.96    46.88
0.76    0.44    5.46    71.13   16.62
3.52    1.15    6.21    31.84   10.33
0.93    2.29    0.83    58.0    18.32
0.56    1.61    5.09    20.07   10.1
0.02    1.23    5.95    16.24
1.5     3.23    4.21    18.9

I have tried using genfromtxt, but it is returning only the first row in the data.

data = np.genfromtxt(filename,dtype=float,usecols=range(6))

Is there an argument that I am missing that I am not aware of that can fix this? If I don't use the usecols argument, the data is returned as one column instead. Setting delimiter='' returned the same result. Ideally I would like to read in the data and then separate it for each column.


Solution

  • Numpy array must be regular, so genfromtxt is not done for that. for such data pandas is probably easier to use , filling missing values with NaN by default :

    In [7]: df.pd.read_csv('file.txt',sep=' *',engine='python',header=None)
    Out[7]: 
           0      1      2      3       4       5
    0   5.00   0.40   0.92  11.45   44.18   33.66
    1   3.20   4.92   7.20  11.73   46.98  118.63
    2   3.60  11.43  14.32   8.88   71.30     NaN
    3   1.99   9.12  11.71  15.56   20.24     NaN
    4   0.77  21.92   2.47  33.99   80.68     NaN
    5   0.91   4.32  14.60  15.69  127.80     NaN
    6   2.67   2.10   5.14   7.96   46.88     NaN
    7   0.76   0.44   5.46  71.13   16.62     NaN
    8   3.52   1.15   6.21  31.84   10.33     NaN
    9   0.93   2.29   0.83  58.00   18.32     NaN
    10  0.56   1.61   5.09  20.07   10.10     NaN
    11  0.02   1.23   5.95  16.24     NaN     NaN
    12  1.50   3.23   4.21  18.90     NaN     NaN
    

    You come back to numpy arrays with df.values.