Search code examples
pythonpandasdimensions

Changing a many column dataframe to a single column without losing information in python


I want to see changes in cell values when comparing multiple dataframes. These dataframes are formed from JSON data, which forms many-column tables, and I cannot easily change this data source. Let's say there are 10 dataframes with 10 rows and 10 columns (equally labelled). I'd like to compare the information, by turning each dataframe into 100 row and 1 column.

for 3x3 example:

import pandas as pd
data = [{'a':1,'b':2,'c':3},{'a':10,'b':20,'c':30},{'a':100,'b':200,'c':300}]
df = pd.DataFrame(data)
df.index = ['x','y','z']

gives this table

     a    b    c
x    1    2    3
y   10   20   30
z  100  200  300

but I would like to have:

    col
xa  1
xb  2
xc  3
ya  10
yb  20
yc  30
za  100
zb  200
zc  300

so that I may then add many columns and compare values changes.

Can somebody advise me on how to do this using pandas? It is okay if a third colum is required, i.e.:

1  2  3
x  a  1
x  b  2
x  c  3
y  a  10
y  b  20
y  c  30
z  a  100
z  b  200
z  c  300

Solution

  • Use DataFrame.stack with Series.to_frame and then flatten MultiIndex to index with map:

    df_us = df.stack().to_frame('col')
    df_us.index = df_us.index.map(lambda x: f'{x[0]}{x[1]}')
    print (df_us)
        col
    xa    1
    xb    2
    xc    3
    ya   10
    yb   20
    yc   30
    za  100
    zb  200
    zc  300
    

    For 3 columns:

    df_us = df.stack().reset_index()
    df_us.columns = [0,1,2]
    print (df_us)
       0  1    2
    0  x  a    1
    1  x  b    2
    2  x  c    3
    3  y  a   10
    4  y  b   20
    5  y  c   30
    6  z  a  100
    7  z  b  200
    8  z  c  300