Search code examples
python-3.xpandasdataframecalculated-columns

Assigning categorical or values in a new column based on numeric values from another column in Python Dataframe


I have a dataframe that has a column (say "Total") with numeric data. The data in this column can be positive, negative, or zero. No range limit on either side of zero.

I wanted to create another column with specific indicators or categorical values based on value in this 'Total' column.

For example (Objectives):

  1. P/N/Z based on Positive/Negative/Zero value in the 'Total' column.
  2. Classes such as "...0-10000,10000-20000,..." based on value in 'Total' column
  3. +1,-1,0 based on value greater than, less than, equal to a specific value in 'Total' column.

As of now, I am doing this by creating a separate list of values by iterating through each row in 'Total' column through an if-else statement and then appending that list of values as a column to the dataframe.

for each in df['Total']:
    values.append(cat1(each))
df['newcol'] = values

Here cat1 is the function that returns P/N/Z based on positive/negative/zero value in each. values is the list of values that I will create using this for-loop. Similarly, I have functions for 2 and 3 from the objectives above.

def cat1(value):
if value > 0:
    return "P"
elif value < 0:
    return "N"
else:
    return "Z"

But I hope there can be a simpler and faster alternative?

Thank you for the help.


Solution

  • I don't know if this approach is any quicker but it definitely utilizes the pandas functionality a little better:

    def cat1(value):
        if value > 0:
            return "P"
        elif value < 0:
            return "N"
        else:
            return "Z"
    df[newcol] = df.apply(lambda row: cat1(row['Total']), axis = 1)