I have got a bunch of reports, all containing the same information, but the columns are inconsistent in order and titles, so when I run the functions to parse the information I am not getting out what I want.
How can I use python to organise the columns in a consistent way to work with later down the line?
Currently the columns look like;
['Comment', 'Rec Date', 'Receipt Number', 'Rec Amt', 'Receipt Method', 'Status']
['Comment', 'State', 'Rec No', 'Rec Date', 'Rec Amt', 'Status']
['Receipt Number', 'Rec Date', 'Rec Amt', 'Comment', 'State', 'Status']
['Comment', 'Receipt Number', 'Rec Date', 'Rec Amt', 'State', 'Rec Method']
['Comment', 'Receipt Number', 'Rec Date', 'Rec Amt', 'Receipt Method', 'State']
['Comment', 'Receipt Number', 'Rec Date', 'Rec Amt', 'State', 'Rec Method']
I would like all of these categories to be included as columns. (Points that don't have a value for a specific column would get some placeholder or empty value.)
First read_excel
for DataFrame
:
df = pd.read_excel('file.xlsx')
I believe you need rename
columns by dict and then sort them by sort_index
:
d = {'Rec No':'Receipt Number', 'Rec Method':'Receipt Method',...}
df = df.rename(columns=d).sort_index(axis=1)