Search code examples
pythonpandasdataframeunique

Output unique values from a pandas dataframe without reordering the output


I know that a few posts have been made regarding how to output the unique values of a dataframe without reordering the data.

I have tried many times to implement these methods, however, I believe that the problem relates to how the dataframe in question has been defined.

Basically, I want to look into the dataframe named "C", and output the unique values into a new dataframe named "C1", without changing the order in which they are stored at the moment.

The line that I use currently is:

C1 = pd.DataFrame(np.unique(C))

However, this returns an ascending order list (while, I simply want the list order preserved only with duplicates removed).

Once again, I apologise to the advanced users who will look at my code and shake their heads -- I'm still learning! And, yes, I have tried numerous methods to solve this problem (redefining the C dataframe, converting the output to be a list etc), to no avail unfortunately, so this is my cry for help to the Python gods. I defined both C and C1 as dataframes, as I understand that these are pretty much the best datastructures to house data in, such that they can be recalled and used later, plus it is quite useful to name the columns without affecting the data contained in the dataframe).

Once again, your help would be much appreciated.

F0 = ('08/02/2018','08/02/2018',50)
F1 = ('08/02/2018','09/02/2018',52)
F2 = ('10/02/2018','11/02/2018',46)
F3 = ('12/02/2018','16/02/2018',55)
F4 = ('09/02/2018','28/02/2018',48)
F_mat = [[F0,F1,F2,F3,F4]]
F_test = pd.DataFrame(np.array(F_mat).reshape(5,3),columns=('startdate','enddate','price'))

#convert string dates into DateTime data type
F_test['startdate'] = pd.to_datetime(F_test['startdate'])
F_test['enddate'] = pd.to_datetime(F_test['enddate'])

#convert datetype to be datetime type for columns startdate and enddate
F['startdate'] = pd.to_datetime(F['startdate'])
F['enddate'] = pd.to_datetime(F['enddate'])

#create contract duration column
F['duration'] = (F['enddate'] - F['startdate']).dt.days + 1

#re-order the F matrix by column 'duration', ensure that the bootstrapping 
#prioritises the shorter term contracts 
F.sort_values(by=['duration'], ascending=[True])

# create prices P
P = pd.DataFrame()
for index, row in F.iterrows():
    new_P_row = pd.Series()
    for date in pd.date_range(row['startdate'], row['enddate']):
        new_P_row[date] = row['price']
    P = P.append(new_P_row, ignore_index=True)

P.fillna(0, inplace=True)

#create C matrix, which records the unique day prices across the observation interval
C = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
C.columns = tempDateRange 

#create the Repatriation matrix, which records the order in which contracts will be 
#stored in the A matrix, which means that once results are generated 
#from the linear solver, we know exactly which CalendarDays map to 
#which columns in the results array
#this array contains numbers from 1 to NbContracts
R = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
R.columns = tempDateRange

#define a zero filled matrix, P1, which will house the dominant daily prices 
P1 = pd.DataFrame(np.zeros((intNbContracts, intNbCalendarDays)))
#rename columns of P1 to be the dates contained in matrix array D
P1.columns = tempDateRange 

#create prices in correct rows in P
for i in list(range(0, intNbContracts)):
    for j in list(range(0, intNbCalendarDays)):
        if (P.iloc[i, j] != 0 and C.iloc[0,j] == 0) :
            flUniqueCalendarMarker = P.iloc[i, j]
            C.iloc[0,j] = flUniqueCalendarMarker
            P1.iloc[i,j] = flUniqueCalendarMarker
            R.iloc[0,j] = i
            for k in list(range(j+1,intNbCalendarDays)):
                if (C.iloc[0,k] == 0 and P.iloc[i,k] != 0):
                    C.iloc[0,k] = flUniqueCalendarMarker
                    P1.iloc[i,k] = flUniqueCalendarMarker
                    R.iloc[0,k] = i
        elif (C.iloc[0,j] != 0 and P.iloc[i,j] != 0):
            P1.iloc[i,j] = C.iloc[0,j]

#convert C dataframe into C_list, in prepataion for converting C_list
#into a unique, order preserved list
C_list = C.values.tolist()

#create C1 matrix, which records the unique day prices across unique days in the observation period
C1 = pd.DataFrame(np.unique(C))

Solution

  • Use DataFrame.duplicated() to check if your data-frame contains any duplicate or not. If yes then you can try DataFrame.drop_duplicate() .