I have the following list of columns of a data frame df
cols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population' 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']
For natural reasons I would like to compress names as follows:
Any occurrences of population
to pop
, male
to m_
, female
to f_
, working
to w
and in the age group 0 to 6 years
to _minor
Please note the spaces being included in the pattern
This Stack overflow Discussion is the starting point, where the requirement is only getting rid off square brackets by matching to a single pattern.
My aim is to obtain multiple matches for multiple patterns
Really appreciate any kind of help!
PS: This is my first time here!
You can try:
import re
new_cols = []
pat = re.compile(r"(?:fe)?male\s*|population|working\s*")
for col in cols_2:
new_col = pat.sub(
lambda g: f"{g[0][0]}_" if g[0][0] in "fmw" else f"{g[0][:3]}", col
)
new_col = new_col.replace(" in the age group 0 to 6 years", "_minor")
new_cols.append(new_col)
print(new_cols)
# to replace the columns in DataFrame then:
# df.columns = new_cols
Prints:
[
"state",
"pop",
"m_pop",
"f_pop",
"w_pop",
"m_w_pop",
"f_w_pop",
"f_pop_minor",
"m_pop_minor",
"pop_minor",
]