I would like to correctly read Unnamed columns from previously generated csv file.
The csv has a multiindex header with different shape for each column, ie, say, column 0 has one coordinate, column 1 has two and column 2 has three. Thus, when reading the full csv, Unnamed columns appear in column 0 and column 1. I need to locate a row with a given value but I cannot pass the correct column.
Imagine 'my.csv' file looks like this:
C0,C1,C2
,C10,C20
,,C22
-2,0,0.4101
-1,1,0.8058
0,0,0.1000
1,3,0.3846
I try to read it:
read_df = pd.read_csv('my.csv', header=[0,1,2])
print(read_df)
C0 C1 C2
Unnamed: 0_level_1 C10 C20
Unnamed: 0_level_2 Unnamed: 1_level_2 C22
0 -2 0 0.4101
1 -1 1 0.8058
2 0 0 0.1000
3 1 3 0.3846
I need to select the rows corresponding to a value of C0, but I end up either in KeyError or in TypeError, such as in the following:
read_df.loc( read_df[('C0','','')] == 0 ) ## KeyError: ('C0', '', '')
read_df.loc( read_df[('C0','0_level_1','0_level_2')] == 0 ) ## KeyError: ('C0', '0_level_1', '0_level_2')
read_df.loc( read_df[('C0', 'Unnamed: 0_level_1', 'Unnamed: 0_level_2')] == 0 ) ## TypeError: 'Series' objects are mutable, thus they cannot be hashed
One turnaround could be to name the columns before hand and prevent this to happen, but it would be nice to understand how to manage this.
Thanks
PS here's how I generate 'my.csv'
import pandas as pd
import random as rnd
col_0 = [('C0', '', '')]
col_1 = [('C1', 'C10','')]
col_2 = [('C2', 'C20', 'C22')]
tot_col = columns=pd.MultiIndex.from_tuples(col_0 + col_1 + col_2)
tot_df = pd.DataFrame(columns=tot_col)
def get_data():
data_dict = { ('C1','C10','') : rnd.randint(0,5),
('C2','C20','C22') : '{:2.4f}'.format(rnd.random()) }
data_df = pd.DataFrame( [data_dict], columns=tot_col )
return(data_df)
for ii in range(-2, 2):
ii_df = get_data()
ii_df[('C0','','')] = ii
tot_df = pd.concat([tot_df, ii_df], ignore_index=True)
tot_df.to_csv('my.csv', index=False)
IIUC you're looking for
read_df.loc[ read_df[('C0', 'Unnamed: 0_level_1', 'Unnamed: 0_level_2')] == 0 ]
or simply
read_df[read_df[('C0', 'Unnamed: 0_level_1', 'Unnamed: 0_level_2')] == 0]
Result:
C0 C1 C2
Unnamed: 0_level_1 C10 C20
Unnamed: 0_level_2 Unnamed: 1_level_2 C22
2 0 4 0.2373
read_df.columns = pd.MultiIndex.from_tuples([tuple('' if y.startswith('Unnamed:') else y for y in x) for x in map(list, read_df.columns.tolist())])
After this you can use as usual:
read_df[read_df[('C0','','')] == 0]
Result:
C0 C1 C2
C10 C20
C22
2 0 4 0.2373