python numpy python-2.7 pandas data-analysis

how to read from an array without a particular column in python

I have a numpy array of dtype = object (which are actually lists of various data types). So it makes a 2D array because I have an array of lists (?). I want to copy every row & only certain columns of this array to another array. I stored data in this array from a csv file. This csv file contains several fields(columns) and large amount of rows. Here's the code chunk I used to store data into the array.

data = np.zeros((401125,), dtype = object)
for i, row in enumerate(csv_file_object):
    data[i] = row

data can be basically depicted as follows

column1  column2  column3  column4  column5 ....
1         none     2       'gona'    5.3
2         34       2       'gina'    5.5
3         none     2       'gana'    5.1
4         43       2       'gena'    5.0
5         none     2       'guna'    5.7
.....     ....   .....      .....    ....
.....     ....   .....      .....    ....
.....     ....   .....      .....    ....

There're unwanted fields in the middle that I want to remove. Suppose I don't want column3. How do I remove only that column from my array? Or copy only relevant columns to another array?

Solution

Use pandas. Also it seems to me, that for various type of data as yours, the pandas.DataFrame may be better fit.

from StringIO import StringIO
from pandas import *
import numpy as np

data = """column1  column2  column3  column4  column5
1         none     2       'gona'    5.3
2         34       2       'gina'    5.5
3         none     2       'gana'    5.1
4         43       2       'gena'    5.0
5         none     2       'guna'    5.7"""

data = StringIO(data)
print read_csv(data, delim_whitespace=True).drop('column3',axis =1)

out:

   column1 column2 column4  column5
0        1    none  'gona'      5.3
1        2      34  'gina'      5.5
2        3    none  'gana'      5.1
3        4      43  'gena'      5.0
4        5    none  'guna'      5.7

If you need an array instead of DataFrame, use the to_records() method:

df.to_records(index = False)
#output:
rec.array([(1L, 'none', "'gona'", 5.3),
           (2L, '34', "'gina'", 5.5),
           (3L, 'none', "'gana'", 5.1),
           (4L, '43', "'gena'", 5.0),
           (5L, 'none', "'guna'", 5.7)], 
            dtype=[('column1', '<i8'), ('column2', '|O4'),
                   ('column4', '|O4'), ('column5', '<f8')])