Search code examples
pythonpandasfilefile-manipulation

How to replace such column names whose next columns are having all nan values in Pandas


I have a file which has data separated with different spaces and column names are also having spaces.

Type Dec LookupTable               Field Name                Field Len Start Pos
NUM  0   _                         sample data               5         1
NUM  0   _                         sample data               10        6
CHAR 0   _                         sample data               60        16
NUM  0   _                         sample data               3         76
CHAR 0   _                         sample data               60        79
CHAR 0   _                         sample data               60        139
CHAR 0   _                         sample data               60        199
CHAR 0   _                         sample data               60        259
NUM  0   _                         sample data               3         319
CHAR 0   _                         sample data               60        322
CHAR 0   _                         sample data               60        382
NUM  0   _                         sample data               3         442
CHAR 0   _                         sample data               60        445

I am reading this file like this

df= pd.read_fwf('./temp.txt', colspecs= 'infer')

and getting the dataframe with columns which are separated by spaces nan values

I want to drop the Nan columns and replace its previous columns name with the empty one. How can we achieve this in an efficient way? enter image description here

Expected outpuL: Expected Output


Solution

  • import pandas as pd
    import numpy as np
    
    aaa = [(df.columns[i - 1], df.columns[i]) for i in range(1, len(df.columns)) if df[df.columns[i]].isna().all()]
    bbb = dict(aaa)
    df = df.drop(columns=np.array(aaa)[:, 1])
    df.rename(columns=bbb, inplace=True)
    
    print(df)
    

    Here, in list comprehension aaa, paired tuples are created (on the left is the name of the column to be renamed, on the right is the name of the column to be deleted). Columns that are all with empty values are checked by the condition:

    if df[df.columns[i]].isna().all()
    

    A dictionary is created from aaa. drop removes the selected columns np.array(aaa)[:, 1] (to select an array by slice I wrap it with np.array).