Search code examples
pythonarrayspandasdataframeslice

How to split an array using its minimum entry


I am trying to split a dataset into two separate ones by finding its minimum point in the first column. I have used idxmin to firstly identify the location of the minimum entry and secondly iloc to slice the array from 0 to the minimum point.

The error I encounter is:

TypeError: cannot do positional indexing on RangeIndex with these indexers [1 96

dtype: int64] of type Series

An example dataset is as shown:

   x             y
0    1.000000  6
1    1.000000  2
2    0.999999  5
3    0.999996  3
4    0.999986  4
..        ...           ...
196  0.999987  3
197  0.999996  3
198  0.999999  2
199  1.000000  1
200  1.000000  4

The x column starts from 1 and decreases to a minimum point near zero, where it increases back to 1. I am looking for the smallest x and its corresponding y point to separate the two.

This is the current code I have written:

data = pd.DataFrame(data)

minimum = pd.DataFrame.idxmin(data)

lower_surface = data.iloc[:minimum]

I understand that the variable minimum will return a location in the DataFrame, and hence I thought I could use iloc to separate the array from the beginning to the minimum point but this is not the case.


Solution

  • You should pick one column as reference. Using the whole DataFrame, you will get an index for each column, which cannot be used to slice:

    data.idxmin()
    
    x      4
    y    199
    dtype: int64
    

    You should instead run:

    minimum = data['x'].idxmin()
    

    Also, technically you have to use loc to slice, not iloc since idxmax return an indice not a position.

    data.loc[:minimum]
    

    Output:

              x  y
    0  1.000000  6
    1  1.000000  2
    2  0.999999  5
    3  0.999996  3
    4  0.999986  4
    

    If you want to slice with iloc you have to use numpy.argmin:

    import numpy as np
    
    data.iloc[:np.argmin(data['x'])]
    

    The output is however slightly different since iloc excludes the end of the slice:

              x  y
    0  1.000000  6
    1  1.000000  2
    2  0.999999  5
    3  0.999996  3