I have a file which has data separated with different spaces and column names are also having spaces.
Type Dec LookupTable Field Name Field Len Start Pos
NUM 0 _ sample data 5 1
NUM 0 _ sample data 10 6
CHAR 0 _ sample data 60 16
NUM 0 _ sample data 3 76
CHAR 0 _ sample data 60 79
CHAR 0 _ sample data 60 139
CHAR 0 _ sample data 60 199
CHAR 0 _ sample data 60 259
NUM 0 _ sample data 3 319
CHAR 0 _ sample data 60 322
CHAR 0 _ sample data 60 382
NUM 0 _ sample data 3 442
CHAR 0 _ sample data 60 445
I am reading this file like this
df= pd.read_fwf('./temp.txt', colspecs= 'infer')
and getting the dataframe with columns which are separated by spaces nan values
I want to drop the Nan columns and replace its previous columns name with the empty one. How can we achieve this in an efficient way?
import pandas as pd
import numpy as np
aaa = [(df.columns[i - 1], df.columns[i]) for i in range(1, len(df.columns)) if df[df.columns[i]].isna().all()]
bbb = dict(aaa)
df = df.drop(columns=np.array(aaa)[:, 1])
df.rename(columns=bbb, inplace=True)
print(df)
Here, in list comprehension aaa
, paired tuples are created (on the left
is the name of the column to be renamed, on the right
is the name of the column to be deleted). Columns that are all with empty values are checked by the condition:
if df[df.columns[i]].isna().all()
A dictionary
is created from aaa
. drop
removes the selected columns np.array(aaa)[:, 1]
(to select an array by slice I wrap it with np.array).