Search code examples
pythonpandasdataframemelt

new dataframe using values in existing dataframe


exdf = pd.DataFrame({'Employee name': ['Alex','Mike'],
 '2014.1': [5, 2], '2014.2': [3, 4], '2014.3': [3, 6], '2014.4': [4, 3], '2015.1': [7, 5], '2015.2': [5, 4]})
exdf


  Employee name  2014.1  2014.2  2014.3  2014.4  2015.1  2015.2
0          Alex       5       3       3       4       7       5
1          Mike       2       4       6       3       5       4

Suppose the above dataframe has several such rows and columns with output from each employee for each quarter. I want to create a new dataframe with columns:

newdf=pd.Dataframe(columns=['Employee name','Year','Quarter','Output'])

So, new dataframe will have nxm rows where n and m are rows and columns in original dataframe. What I have tried is filling every row and column entry using nested for loop.

But I'm sure there is a more efficient method.

for i in range(df.shape[0]):
    for j in range(df.shape[1]):
        newdf.iloc[?]=exdf.iloc[?]


Solution

  • Use DataFrame.melt with Series.str.split, last change order of columns:

    df = exdf.melt('Employee name', var_name='Year', value_name='Output')
    df[['Year', 'Quarter']] = df['Year'].str.split('.', expand=True)
    df = df[['Employee name','Year','Quarter','Output']]
    print (df)
       Employee name  Year Quarter  Output
    0           Alex  2014       1       5
    1           Mike  2014       1       2
    2           Alex  2014       2       3
    3           Mike  2014       2       4
    4           Alex  2014       3       3
    5           Mike  2014       3       6
    6           Alex  2014       4       4
    7           Mike  2014       4       3
    8           Alex  2015       1       7
    9           Mike  2015       1       5
    10          Alex  2015       2       5
    11          Mike  2015       2       4