Search code examples
pythonpandaslistdataframelist-manipulation

Get complete list up to the last unique value using Python and Pandas


My goal is to get the complete list up until the last unique value. Ideally, I would like methods for performing this operation Pythonically and using Pandas, but a single method solution will work great. Also, I need to preserve the ordering of the list.

Also, in my example shown below, the last unique value happens to be the largest value in the list. This is not necessarily true for my application. The last unique value in my list can take on any value; meaning, it could be the smallest, the largest, or any value in between.

Below I show the progress I have made so far.

import pandas as pd

data_dict = {"RAW": [4000076160, 5354368, 4641792, 4641792, 4289860736, 982783232, 2122384768,
                     4136386944, 5440384, 4772864, 4772864, 4289881216,                     
                     4270354816, 4293477248, 4286243840, 4286243840, 3400832, 982783232, 2122384768],
             "ADC_TYPE": [3, 7, 8, 8, 9, 10, 11,
                          3, 7, 8, 8, 9,                          
                          3, 7, 8, 8, 9, 10, 11]}

df = pd.DataFrame(data_dict)
print(df)

The returned DataFrame (i.e. df):

           RAW  ADC_TYPE
0   4000076160         3
1      5354368         7
2      4641792         8
3      4641792         8
4   4289860736         9
5    982783232        10
6   2122384768        11
7   4136386944         3
8      5440384         7
9      4772864         8
10     4772864         8
11  4289881216         9
12  4270354816         3
13  4293477248         7
14  4286243840         8
15  4286243840         8
16     3400832         9
17   982783232        10
18  2122384768        11

I can use the following piece of code, but it will not return the complete list up to the last unique value.

unique_types = df["ADC_TYPE"].unique().tolist()  # return type is python list
print(unique_types)

Which returns:

[3, 7, 8, 9, 10, 11]

My goal is to return:

[3, 7, 8, 8, 9, 10, 11]

I have searched through this forum and Google, but I have not found a solution to my problem thus far. I have found several examples that return a list of unique values, but not an example that returns the complete list up until the last unique value. Thanks!


Solution

  • You can use idxmax() to find the first occurrence of the max value (adding one due to zero-indexing), then use iloc to slice the dataframe to only that value

    df.iloc[:df['ADC_TYPE'].idxmax()+1,1].tolist()
    
    [3, 7, 8, 8, 9, 10, 11]
    

    Or operating just on the column in question to get the same result

    df['ADC_TYPE'][:df['ADC_TYPE'].idxmax()+1].tolist()
    

    New version based on unsorted data (switched 10 and 11 in the first occurrence):

    data_dict = {"RAW": [4000076160, 5354368, 4641792, 4641792, 4289860736, 982783232, 2122384768,
                         4136386944, 5440384, 4772864, 4772864, 4289881216,                     
                         4270354816, 4293477248, 4286243840, 4286243840, 3400832, 982783232, 2122384768],
                 "ADC_TYPE": [3, 7, 8, 8, 9, 11, 10,
                              3, 7, 8, 8, 9,                          
                              3, 7, 8, 8, 9, 10, 11]}
    
    df = pd.DataFrame(data_dict)
    
        RAW ADC_TYPE
    0   4000076160  3
    1   5354368 7
    2   4641792 8
    3   4641792 8
    4   4289860736  9
    5   982783232   11
    6   2122384768  10
    7   4136386944  3
    8   5440384 7
    9   4772864 8
    10  4772864 8
    11  4289881216  9
    12  4270354816  3
    13  4293477248  7
    14  4286243840  8
    15  4286243840  8
    16  3400832 9
    17  982783232   10
    18  2122384768  11
    
    #We get the list of unique values in the order they appear
    vals=[]
    for i in df['ADC_TYPE']:
        if i not in vals:
            vals.append(i)
    print(vals)
    #We take the _last_ value from the list
    last_unique=vals.pop()
    print(last_unique)
    #We find the index of the first occurrence of that value
    idx = (df['ADC_TYPE'] == last_unique).idxmax()
    print(idx)
    #We use the previous method to get the values up to that index
    up_to_last=df.iloc[:idx+1,1].tolist()
    print(up_to_last)
    
    
    [3, 7, 8, 9, 11, 10]
    10
    6
    [3, 7, 8, 8, 9, 11, 10]