Search code examples
pythonnumpydel

numpy delete shape of passed value error


I'm trying to do a very simple delete on a numpy dataset using

dataset = pd.read_csv('putty.log', sep='\s+', header = 0)
badData = np.argwhere(np.isnan(dataset.loc[:,'Temp']))
np.delete(dataset, badData, 0)

but I get an error saying

ValueError: Shape of passed values is (8, 529292), indices imply (8, 536668)

Even if I simply do

np.delete(dataset, 14, 0)

I get

'ValueError: Shape of passed values is (8, 536667), indices imply (8, 536668)'

Of course 536667 should be the size of the new array, so what's the problem?

dataset.head(5)
  count           Fx          Fy  ...    AngX    AngY   Temp
0   151  -342818.906  -13860.325  ...    1040  1052.0  176.0
1   152  -342869.781  -13268.041  ...    1039  1051.0  176.0
2   153  -343521.312  -13044.709  ...    1043  1053.0  176.0
3   154  -343697.343  -13502.697  ...    1040  1052.0  176.0
4   155  -343553.468  -13164.850  ...    1040  1052.0  176.0
[5 rows x 8 columns]

Solution

  • The problem is that you are trying to use numpy delete in a pandas dataframe.

    You can convert your dataset to numpy, delete and put it back into a dataframe, or remove the rows using an existing pandas function that does that.

    Option 1: Converting to numpy and then back to dataframe

    Simple example using random values and deleting row of index 3

    >>> df
          count        Fx        Fy         A         B      AngX      AngY      Temp
    0  0.835154  0.399818  0.813946  0.828186  0.418237  0.431655  0.114101  0.686881
    1  0.882480  0.363054  0.298512  0.179800  0.689665  0.018929  0.477470  0.088163
    2  0.217667  0.511877  0.283514  0.541611  0.748867  0.173256  0.738801  0.359404
    3  0.820754  0.598249  0.361888  0.461686  0.027692  0.160760  0.322443  0.687293
    4  0.666681  0.423966  0.613454  0.468823  0.171541  0.487825  0.825111  0.413490
    >>> np_values = df.values
    >>> np_new_values = np.delete(np_values, 3, 0)
    >>> df = pd.DataFrame(np_new_values, columns=['count', 'Fx', 'Fy', 'A', 'B', 'AngX', 'AngY', 'Temp'])
    >>> df
          count        Fx        Fy         A         B      AngX      AngY      Temp
    0  0.835154  0.399818  0.813946  0.828186  0.418237  0.431655  0.114101  0.686881
    1  0.882480  0.363054  0.298512  0.179800  0.689665  0.018929  0.477470  0.088163
    2  0.217667  0.511877  0.283514  0.541611  0.748867  0.173256  0.738801  0.359404
    3  0.666681  0.423966  0.613454  0.468823  0.171541  0.487825  0.825111  0.413490
    >>> 
    

    Option 2: Filtering the dataframe

    Assume you want to remove the rows where Temp is Nan. You can filter the rows and create a new dataset, as simple as that:

    >>> df
          count        Fx        Fy         A         B      AngX      AngY      Temp
    0  0.320627  0.757144  0.633840  0.481710  0.553908  0.439086  0.745160  0.022574
    1  0.029232  0.285503  0.832308  0.269803  0.367305  0.558367  0.811343       NaN
    2  0.311669  0.958565  0.159508  0.642381  0.930498  0.738135  0.255059  0.109702
    3  0.576281  0.686696  0.419363  0.914394  0.825495  0.999091  0.126657  0.731871
    4  0.323572  0.186353  0.149007  0.436962  0.699664  0.910051  0.118339  0.070458
    >>> df[df['Temp'].notnull()]
          count        Fx        Fy         A         B      AngX      AngY      Temp
    0  0.320627  0.757144  0.633840  0.481710  0.553908  0.439086  0.745160  0.022574
    2  0.311669  0.958565  0.159508  0.642381  0.930498  0.738135  0.255059  0.109702
    3  0.576281  0.686696  0.419363  0.914394  0.825495  0.999091  0.126657  0.731871
    4  0.323572  0.186353  0.149007  0.436962  0.699664  0.910051  0.118339  0.070458