Search code examples
pythonexcelpandasrowsdrop

Drop a row based on empty/blanks values in specific column in final excel file - Pandas Data frame


I am trying to drop rows based on the empty values in a specific column of excel after joining the data for both excels. I have tried some queries mentioned in stack overflow, but the desire results are not achieved. Kindly support and thanks

First Excel File Named: Fileq.xls Contents are below

Item Applied Qty Reference Qty
Item A 0
Item B 20 18
Item C 0
Item D 1
Item E 2

First Excel File Named: Data.xls Contents are below

Item Applied Qty Reference Qty
Item G 0
Item H 12 13
Item I 0
Item J 1 25
Item K 2

Desired Results:-

Item Applied Qty Reference Qty
Item B 20 18
Item H 12 13
Item J 1 25

Code I am trying:-

import pandas as pd
DATA_DIR = Path.cwd() / r'C:\Python'

excel001 = DATA_DIR / 'Fileq.xls'

excel002 = DATA_DIR / 'Data.xls'

df001 = pd.read_excel(excel001)

df002 = pd.read_excel(excel002)

values001 = df001

values002 = df002

dataframes = [values001, values002]

join = pd.concat(dataframes)

#Tried many different combinations but non are working as below

new_df= join['Reference Qty'].replace(r'^\s*$', np.nan, inplace=True) 

print(new_df)

Solution

  • Very close! Just missing dropna we can pass a dict to replace so as to only replace on the 'Reference Qty' column instead of replacing the entire DataFrame:

    new_df = (
        pd.concat([df1, df2])
            .replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
            .dropna(subset=['Reference Qty'])
    )
    

    new_df:

         Item  Applied Qty Reference Qty
    1  Item B           20          18.0
    1  Item H           12          13.0
    3  Item J            1          25.0
    

    Complete Working Example:

    import numpy as np
    import pandas as pd
    
    df1 = pd.DataFrame({
        'Item': ['Item A', 'Item B', 'Item C', 'Item D', 'Item E'],
        'Applied Qty': [0, 20, 0, 1, 2],
        'Reference Qty': ['', 18.0, '', '', '']
    })
    
    df2 = pd.DataFrame({
        'Item': ['Item G', 'Item H', 'Item I', 'Item J', 'Item K'],
        'Applied Qty': [0, 12, 0, 1, 2],
        'Reference Qty': ['', 13.0, '', 25.0, '']
    })
    
    new_df = (
        pd.concat([df1, df2])
            .replace({'Reference Qty': r'^\s*$'}, np.nan, regex=True)
            .dropna(subset=['Reference Qty'])
    )
    
    print(new_df)