How to expand a list of features based off one column in pandas?

I have the following example df:

year   school    occupation     a      b     c
2020   GA Tech     engineer     2     3      4
2020  GA Tech      Business     1     5      4
.
.
.

There are many of the same occupations. I want the following dataframe:

year   school       engineer_a     engineer_ b     engineer_c Business_a  Business_b Businesss_c
2020   GA Tech       2                3                 4         1         5             4

where it is transformed into one big row for that year. How would I do this?

I have tried pivot tables and I couldn't figure out a way to make it work.

Solution

You could try to use .pivot in the following way:

df = (df.pivot(index=["year", "school"], columns="occupation")
      .sort_index(level=1, axis=1))
df.columns = [f"{b}_{a}" for a, b in df.columns]
df = df.reset_index()

Result for the short sample:

   year   school  Business_a  Business_b  Business_c  engineer_a  engineer_b  engineer_c
0  2020  GA Tech           1           5           4           2           3           4

If your comment "... there can be many of the same occupation" means that df could look like

   year   school occupation    a    b    c
0  2020  GA Tech   engineer    2    3    4
1  2020  GA Tech   engineer   20   30   40
2  2020  GA Tech   Business    1    5    4
3  2020  GA Tech   Business   10   50   40
4  2020  GA Tech   Business  100  500  400

then try

df = (
    df.assign(count=df.groupby(["year", "school", "occupation"]).cumcount())
    .pivot(index=["count", "year", "school"], columns="occupation")
    .sort_index(level=1, axis=1)
)
df.columns = [f"{b}_{a}" for a, b in df.columns]
df = df.reset_index().drop(columns="count")

to get

   year   school  Business_a  Business_b  Business_c  engineer_a  engineer_b  engineer_c
0  2020  GA Tech         1.0         5.0         4.0         2.0         3.0         4.0
1  2020  GA Tech        10.0        50.0        40.0        20.0        30.0        40.0
2  2020  GA Tech       100.0       500.0       400.0         NaN         NaN         NaN