Search code examples
pythonpandasnumpypython-3.9

Create new column based on existing columns


I have a Pandas dataframe which look like this.

Deviated_price  standard_price   
744,600          789,276
693,600          789,276
693,600          735,216
                 735,216
744,600          735,216
                 735,216

I want to create a new column called net_standard_price. values for the net standard price will be based on Deviated_price and standard_price columns.

If Deviated price is not blank then net_standard_price should be blank. If Deviated price is blank then net_standard_price should contain standard_price value.

Net_standard_price should look like this.

Deviated_price  standard_price  Net_standard_price   
                 789,276           789,276
693,600          789,276
693,600          735,216
                 735,216           735,216
744,600          735,216
                 735,216           735,216

I tried below code using np.where but Net_standard_price is empty for all the records.

df['Net_standard_price'] = np.where(df['Deviated_price'] != '',
                                        '', df['standard_price'])

What's the most efficient way to do this?


Solution

  • Moving to numpy domain gave some performance boost

    import pandas as pd
    import numpy as np
    from timeit import Timer
    
    
    def make_df():
        random_state = np.random.RandomState()
        df = pd.DataFrame(random_state.random((10000, 2)), columns=['Deviated_price', 'standard_price'], dtype=str)
        df['Deviated_price'][random_state.randint(0, 2, len(df)).astype(np.bool)] = None
        return df
    
    
    def test1(df):
        df['Net_standard_price'] = np.where(df['Deviated_price'] != '',
                                            '', df['standard_price'])
    
    def test2(df):
        df['Net_standard_price'] = np.where(df['Deviated_price'].isna(), df['standard_price'], None)
    
    def test3(df):
        temp = df['standard_price'].values
        temp2 = df['Deviated_price'].values
        net_standard_price = temp.copy()
        net_standard_price[temp2 == ''] = ''
        df['Net_standard_price'] = net_standard_price
    
    timing = Timer(setup='df = make_df()', stmt='test1(df)', globals=globals()).timeit(500)
    print('test1: ', timing)
    
    timing = Timer(setup='df = make_df()', stmt='test2(df)', globals=globals()).timeit(500)
    print('test2: ', timing)
    
    timing = Timer(setup='df = make_df()', stmt='test3(df)', globals=globals()).timeit(500)
    print('test3: ', timing)
    
    test1:  0.42146812000000006
    test2:  0.417552648
    test3:  0.2913768969999999