I'm trying to do a very simple delete on a numpy dataset using
dataset = pd.read_csv('putty.log', sep='\s+', header = 0)
badData = np.argwhere(np.isnan(dataset.loc[:,'Temp']))
np.delete(dataset, badData, 0)
but I get an error saying
ValueError: Shape of passed values is (8, 529292), indices imply (8, 536668)
Even if I simply do
np.delete(dataset, 14, 0)
I get
'ValueError: Shape of passed values is (8, 536667), indices imply (8, 536668)'
Of course 536667 should be the size of the new array, so what's the problem?
dataset.head(5)
count Fx Fy ... AngX AngY Temp
0 151 -342818.906 -13860.325 ... 1040 1052.0 176.0
1 152 -342869.781 -13268.041 ... 1039 1051.0 176.0
2 153 -343521.312 -13044.709 ... 1043 1053.0 176.0
3 154 -343697.343 -13502.697 ... 1040 1052.0 176.0
4 155 -343553.468 -13164.850 ... 1040 1052.0 176.0
[5 rows x 8 columns]
The problem is that you are trying to use numpy delete in a pandas dataframe.
You can convert your dataset to numpy, delete and put it back into a dataframe, or remove the rows using an existing pandas function that does that.
Simple example using random values and deleting row of index 3
>>> df
count Fx Fy A B AngX AngY Temp
0 0.835154 0.399818 0.813946 0.828186 0.418237 0.431655 0.114101 0.686881
1 0.882480 0.363054 0.298512 0.179800 0.689665 0.018929 0.477470 0.088163
2 0.217667 0.511877 0.283514 0.541611 0.748867 0.173256 0.738801 0.359404
3 0.820754 0.598249 0.361888 0.461686 0.027692 0.160760 0.322443 0.687293
4 0.666681 0.423966 0.613454 0.468823 0.171541 0.487825 0.825111 0.413490
>>> np_values = df.values
>>> np_new_values = np.delete(np_values, 3, 0)
>>> df = pd.DataFrame(np_new_values, columns=['count', 'Fx', 'Fy', 'A', 'B', 'AngX', 'AngY', 'Temp'])
>>> df
count Fx Fy A B AngX AngY Temp
0 0.835154 0.399818 0.813946 0.828186 0.418237 0.431655 0.114101 0.686881
1 0.882480 0.363054 0.298512 0.179800 0.689665 0.018929 0.477470 0.088163
2 0.217667 0.511877 0.283514 0.541611 0.748867 0.173256 0.738801 0.359404
3 0.666681 0.423966 0.613454 0.468823 0.171541 0.487825 0.825111 0.413490
>>>
Assume you want to remove the rows where Temp is Nan. You can filter the rows and create a new dataset, as simple as that:
>>> df
count Fx Fy A B AngX AngY Temp
0 0.320627 0.757144 0.633840 0.481710 0.553908 0.439086 0.745160 0.022574
1 0.029232 0.285503 0.832308 0.269803 0.367305 0.558367 0.811343 NaN
2 0.311669 0.958565 0.159508 0.642381 0.930498 0.738135 0.255059 0.109702
3 0.576281 0.686696 0.419363 0.914394 0.825495 0.999091 0.126657 0.731871
4 0.323572 0.186353 0.149007 0.436962 0.699664 0.910051 0.118339 0.070458
>>> df[df['Temp'].notnull()]
count Fx Fy A B AngX AngY Temp
0 0.320627 0.757144 0.633840 0.481710 0.553908 0.439086 0.745160 0.022574
2 0.311669 0.958565 0.159508 0.642381 0.930498 0.738135 0.255059 0.109702
3 0.576281 0.686696 0.419363 0.914394 0.825495 0.999091 0.126657 0.731871
4 0.323572 0.186353 0.149007 0.436962 0.699664 0.910051 0.118339 0.070458