SUMMARY of my problem:
I have many DataFrames, all with the SAME POOL of columns (7 columns, for example COLUMN1:COLUMN7), BUT sometimes one or more columns are missing (i.e. a DataFrame might have COLUMN1:COLUMN3 + COLUMN6:COLUMN7, hence 4th and 5th column missing).
Each DataFrame has columns arranged in different order every time (i.e. df1 has its order, df2 has another order, df3 yet another order and so on...).
I want to arrange the columns in each DataFrame based on a list of columns that serves as a benchmark (in this case a list of columns from 1 to 7).
The desired result would be for all the Dataframes to have the same column order based on this list, if a column is missing the order should be preserved (i.e. if column 4 and 5 are missing, the order of the columns should be: COL1, COL2, COL3, COL6, COL7).
More detailed description:
I have several DataFrames in my code that are produced by cleaning some datasets. Each one of these DataFrames has DIFFERENT NUMBER of columns and in DIFFERENT ORDER, BUT the columns are limited to this list: 'id', 'title', 'type', 'category', 'secondary category', 'date', 'description'
. Hence the columns can be 7 at most, from this list. Example:
DataFrame1 'id', 'title', 'date', 'category', 'type', 'description', 'secondary category'
DataFrame2 'id', 'description', 'title', 'type', 'category', 'date'
DataFrame3 'id', 'category', 'description', 'title'
DESIRED OUTPUT:
I would like to order the columns based on the initial list 'id', 'title', 'type', 'category', 'secondary category', 'date', 'description'
, even if the number of columns varies.
From the example above the DataFrames should become:
DataFrame1 'id', 'title', 'type', 'category', 'secondary category', 'date', 'description'
DataFrame2 'id', 'title', 'type', 'category', 'date', 'description'
DataFrame3 'id', 'title', 'category', 'description'
is there a way, a loop for example, to arrange the columns this way?
You can use list comprehension to sort the order of the columns and use reindex
to set the right order:
desired_order = ['id', 'title', 'type', 'category', 'secondary category', 'date', 'description']
df = df.reindex([i for i in desired_order if i in df.columns], axis=1)