Search code examples
pythonexcelpandasrow

removing entirely NA rows in excel files with python


I am making excel file merging program, and I found there are so many blank rows in excel file. I try to erase it but it does not works. This is my codding as followed.

def btncmd():

    total_df = pd.DataFrame()

    for i in file_content:
        df = pd.read_excel(i)
        total_df = total_df.append(df, ignore_index = True)

file_content is a list that has excel files. I will also attach the image of excel file samples. enter image description here

        total_df.dropna(axis=0, how='all')
    total_df

I found there are so many blank in excel file. excel rows[2:3] is okay but the other blank rows need to remove. I want to remove the blank rows so I used dropna(how = 'all'), but it does not works in my codding.

    merged_excel_file = "D:/사업/발주 및 정산/정산/정산서/"+ txt_file.get()  + ".xlsx"

    total_df.to_excel(merged_excel_file, sheet_name = "sheet", index=False)

    print('생성파일:', merged_excel_file)

I changed csv file into xlsx file and saved it. I works well. but the excel file title is repeated and it should be removed except the first one. I checked it in the image of excel file

btn = Button(frame_folder, text='클릭', command=btncmd)
btn.pack(side='left', padx=6, pady=5)

Solution

  • I think you can just drop rows with NaN values after doing doing

    df = pd.read_excel(i)
    df.dropna(inplace=True)
    

    That should remove any row which contains a nan value. If you used how=all then it will only remove a row where all the values are nan, i.e. a row with 9 values and one column that contains something will not be removed.

    You also need to include inplace=True so that it modifies the dataframe. If you don't do this it won't modify the dataframe but will instead return a dataframe with nan values removed.