Search code examples
pythonpandasdataframecolormap

Merge 2 lists into dataframe and pivot based on string slice of index


I have two lists in Python, cohorts and pct_error_avgs:

['FGLMC 1.5 2020',
 'FGLMC 1.5 2021',
 'FNCI 1.5 2020',
 'FNCI 1.5 2021',
 'FNCL 1.5 2020',
 'FNCL 1.5 2021',
 'G2SF 1.5 2021',
 'FGLMC 2.5 2016',
 'FGLMC 2.5 2019',
 'FGLMC 2.5 2020',
 'FGLMC 2.5 2021',
 'FGLMC 2.5 2013',
 'FNCI 2.5 2016',
 'FNCI 2.5 2017',
 'FNCI 2.5 2019',
 'FNCI 2.5 2020',
 'FNCI 2.5 2021',
 'FNCI 2.5 2013',
 'FNCI 2.5 2014',
 'FNCI 2.5 2015',
 'FNCL 2.5 2016',
 'FNCL 2.5 2019',
 'FNCL 2.5 2020',
 'FNCL 2.5 2021',
 'FNCL 2.5 2013',
 'G2SF 2.5 2016',
 'G2SF 2.5 2019',
 'G2SF 2.5 2020',
 'G2SF 2.5 2021',
 'G2SF 2.5 2013',
 'FGLMC 3.0 2016',
 'FGLMC 3.0 2017',
 'FGLMC 3.0 2018',
 'FGLMC 3.0 2019',
 'FGLMC 3.0 2020',
 'FGLMC 3.0 2021',
 'FGLMC 3.0 2013',
 'FGLMC 3.0 2014',
 'FGLMC 3.0 2015',
 'FNCI 3.0 2016',
 'FNCI 3.0 2017',
 'FNCI 3.0 2018',
 'FNCI 3.0 2019',
 'FNCI 3.0 2020',
 'FNCI 3.0 2013',
 'FNCI 3.0 2014',
 'FNCI 3.0 2015',
 'FNCL 3.0 2016',
 'FNCL 3.0 2017',
 'FNCL 3.0 2018',
 'FNCL 3.0 2019',
 'FNCL 3.0 2020',
 'FNCL 3.0 2021',
 'FNCL 3.0 2013',
 'FNCL 3.0 2014',
 'FNCL 3.0 2015',
 'G2SF 3.0 2016',
 'G2SF 3.0 2017',
 'G2SF 3.0 2018',
 'G2SF 3.0 2019',
 'G2SF 3.0 2020',
 'G2SF 3.0 2021',
 'G2SF 3.0 2013',
 'G2SF 3.0 2014',
 'G2SF 3.0 2015',
 'FGLMC 4.5 2016',
 'FGLMC 4.5 2017',
 'FGLMC 4.5 2018',
 'FGLMC 4.5 2019',
 'FGLMC 4.5 2020',
 'FGLMC 4.5 2013',
 'FGLMC 4.5 2014',
 'FGLMC 4.5 2015',
 'FNCL 4.5 2016',
 'FNCL 4.5 2017',
 'FNCL 4.5 2018',
 'FNCL 4.5 2019',
 'FNCL 4.5 2020',
 'FNCL 4.5 2013',
 'FNCL 4.5 2014',
 'FNCL 4.5 2015',
 'G2SF 4.5 2016',
 'G2SF 4.5 2017',
 'G2SF 4.5 2018',
 'G2SF 4.5 2019',
 'G2SF 4.5 2020',
 'G2SF 4.5 2013',
 'G2SF 4.5 2014',
 'G2SF 4.5 2015',
 'FGLMC 3.5 2016',
 'FGLMC 3.5 2017',
 'FGLMC 3.5 2018',
 'FGLMC 3.5 2019',
 'FGLMC 3.5 2020',
 'FGLMC 3.5 2013',
 'FGLMC 3.5 2014',
 'FGLMC 3.5 2015',
 'FNCI 3.5 2016',
 'FNCI 3.5 2017',
 'FNCI 3.5 2018',
 'FNCI 3.5 2019',
 'FNCI 3.5 2020',
 'FNCI 3.5 2013',
 'FNCI 3.5 2014',
 'FNCI 3.5 2015',
 'FNCL 3.5 2016',
 'FNCL 3.5 2017',
 'FNCL 3.5 2018',
 'FNCL 3.5 2019',
 'FNCL 3.5 2020',
 'FNCL 3.5 2013',
 'FNCL 3.5 2014',
 'FNCL 3.5 2015',
 'G2SF 3.5 2016',
 'G2SF 3.5 2017',
 'G2SF 3.5 2018',
 'G2SF 3.5 2019',
 'G2SF 3.5 2020',
 'G2SF 3.5 2013',
 'G2SF 3.5 2014',
 'G2SF 3.5 2015',
 'FGLMC 4.0 2016',
 'FGLMC 4.0 2017',
 'FGLMC 4.0 2018',
 'FGLMC 4.0 2019',
 'FGLMC 4.0 2020',
 'FGLMC 4.0 2013',
 'FGLMC 4.0 2014',
 'FGLMC 4.0 2015',
 'FNCI 4.0 2018',
 'FNCI 4.0 2019',
 'FNCL 4.0 2016',
 'FNCL 4.0 2017',
 'FNCL 4.0 2018',
 'FNCL 4.0 2019',
 'FNCL 4.0 2020',
 'FNCL 4.0 2013',
 'FNCL 4.0 2014',
 'FNCL 4.0 2015',
 'G2SF 4.0 2016',
 'G2SF 4.0 2017',
 'G2SF 4.0 2018',
 'G2SF 4.0 2019',
 'G2SF 4.0 2020',
 'G2SF 4.0 2013',
 'G2SF 4.0 2014',
 'G2SF 4.0 2015',
 'FGLMC 5.0 2018',
 'FGLMC 5.0 2019',
 'FNCL 5.0 2017',
 'FNCL 5.0 2018',
 'FNCL 5.0 2019',
 'G2SF 5.0 2017',
 'G2SF 5.0 2018',
 'G2SF 5.0 2019',
 'FGLMC 2.0 2020',
 'FGLMC 2.0 2021',
 'FNCI 2.0 2016',
 'FNCI 2.0 2019',
 'FNCI 2.0 2020',
 'FNCI 2.0 2021',
 'FNCI 2.0 2013',
 'FNCL 2.0 2020',
 'FNCL 2.0 2021',
 'G2SF 2.0 2020',
 'G2SF 2.0 2021']

and

[18.511653803917312,
 803.3277232188467,
 20.879598352582743,
 23.061526674602863,
 12.106916228656653,
 4.564106409280872,
 16.48924134720711,
 31.897786311640903,
 90.82182204142883,
 86.08014457286167,
 8.807412993792758,
 41.90706996742475,
 19.239897510321597,
 8.544982155650809,
 28.667788897528695,
 9.354739751997611,
 9.366974044833816,
 9.648734105574324,
 9.62243743701085,
 12.097526400258708,
 26.033115248335935,
 23.84202428965662,
 79.9258393866268,
 9.81092590173131,
 37.57219192732333,
 10.58398542064468,
 13.318979596632978,
 58.172584861616485,
 97.12385204427193,
 26.371943481629547,
 14.055125952164033,
 46.923565047310916,
 52.71256227167411,
 17.78049756656653,
 8.087632477206547,
 3.1076688530915,
 25.104462153652747,
 17.724224027537982,
 39.510030612944355,
 7.696395910653633,
 15.423253687206726,
 10.09859797631793,
 17.805560841505663,
 11.261766561550226,
 11.720949451356269,
 11.043718215212518,
 7.743457385414479,
 17.02699359437735,
 9.373651957987798,
 22.082150955884632,
 25.88076428653414,
 15.027098471304884,
 4.109481626588267,
 6.660791559210973,
 31.279312276620047,
 13.092616085741861,
 34.872280483467165,
 7.761781664435596,
 5.726743560595244,
 52.0470742459319,
 20.478883582574507,
 136.593747982462,
 21.899621454350928,
 22.854157005890162,
 9.993628015815355,
 5.805938911392635,
 11.28282938139225,
 4.302673165118137,
 7.691698056981651,
 17.680837826053793,
 7.580046267785675,
 8.647750464217525,
 16.091778752144464,
 8.26397135003027,
 6.165006018896393,
 10.872103751808881,
 8.461277910286249,
 6.312261679609717,
 6.435218634806583,
 5.465541838477931,
 7.998785528527491,
 5.936519352467041,
 8.96182860117524,
 17.216898431294233,
 16.554384066658212,
 23.998139604225695,
 16.020763943016878,
 18.344547708038903,
 19.580130064246934,
 5.597251789620585,
 29.538655504372464,
 11.529460049203612,
 41.65652355563307,
 7.534706010291981,
 5.458830632047486,
 9.646033325263623,
 19.55569618202263,
 5.8850523790482825,
 8.714300991719844,
 11.58937969790251,
 24.020989884789287,
 12.664611707169488,
 5.241446537498042,
 12.042071901802451,
 5.306575084792222,
 16.87044186853652,
 17.553183695404115,
 8.840714625425962,
 15.255153205758848,
 8.605884120635006,
 15.785374934287246,
 15.382184721845867,
 10.915574805453202,
 12.988658133115479,
 13.863529546392472,
 7.502417114069642,
 178.71070203142534,
 3.7609967030133875,
 11.710794989255119,
 8.32253140731873,
 7.496452509005945,
 12.174906387799387,
 8.15777675890506,
 6.910224847597885,
 10.813336358980777,
 16.732558124259025,
 4.588704356916162,
 3.362051953207668,
 5.458219692259153,
 7.220815479162765,
 17.921978793919504,
 8.27032695310345,
 8.400000699797062,
 9.986367527402269,
 12.274836294499117,
 6.06966957922299,
 10.490697503197428,
 21.60567669644365,
 6.946606113500815,
 11.206916281150407,
 9.575522100260692,
 7.000956510729425,
 83.64785205790251,
 19.688890250007574,
 17.86966050430724,
 6.351466415746062,
 17.91235066627347,
 4.551121378944374,
 7.633491548751053,
 8.19547988827195,
 9.744473791762092,
 6.99900570227771,
 8.263462235765623,
 9.266876117031291,
 13.606151924626023,
 16.810038670928304,
 15.863699290846276,
 7.505931633618261,
 40.82467677144624,
 20.604082534244025,
 10.805561563018292,
 4.646700764970427,
 29.129671104894005,
 105.49166466322329,
 65.60051590275509,
 32.123066059800884]

I've combined them into a single dataframe and applied some color mapping to the cells:

lr_pct_errors = pd.DataFrame(pct_error_avgs, cohorts)
def _color_red_or_green(val):
    color = 'red' if val >10 else 'green'
    return 'color: %s' % color
lr_pct_errors[0:20].style.applymap(lambda x: "background-color: red" if x>10 else "background-color: green")

And this is what the first 20 rows look like:

df

My problem is that I want to basically pivot the whole thing and break the rows out into columns based on the year in each index, and leaving the index just as the cohort (FNCI 2.5 for example) so all of the years for FNCI 2.5 are in one row instead of many. The desired output would look something like this

pic 2

But with all the cohorts in the list (not just those two in the example), and for cohorts without all of the years it would just be NaN. I imagine this could be accomplished with some sort of pivot or melt function and by slicing on the indices to pull out just the years, but not totally sure. Any help would be appreciated.


Solution

  • You can extract the year from your index by first resetting the index to be a normal column. Then use string split to get the year and put that in it's own column. Then remove the year from the string column and then use the pivot function.

    lr_pct_errors = lr_pct_errors.reset_index()
    lr_pct_errors['year'] = lr_pct_errors['index'].str.split(expand=True)[2].astype('int')
    lr_pct_errors['index'] = lr_pct_errors['index'].str[:-5]
    lr_pct_errors.pivot(columns='year', index='index', values=0)