Search code examples
pythonpandasqlikviewqliksense

Doing Crosstable in Pandas like in Qlik?


I have a dataframe:

    df1=pd.DataFrame({
        'ID':[101,102],
        'Name':['Axel','Bob'],
        'US':['GrA','GrC'],
        'Europe':['GrB','GrD'],
        'AsiaPac':['GrZ','GrF']
     })

Which I want to change to this:

    df2=pd.DataFrame({
    'ID':[101,101,101,102,102,102],
    'Name':['Axel','Axel','Axel','Bob','Bob','Bob'],
    'Region':['US','Europe','AsiaPac','US','Europe','AsiaPac'],
    'Group':['GrA','GrB','GrZ','GrC','GrD','GrF']
})

How do I do it? There is a crosstab function in pandas but it doesn't do this. In Qlik I would simply do

    Crosstable(Region,Group,2)  
    LOAD
        ID,
        Name,
        US,
        Europe,
        AsiaPac

And I would go from df1 to df2. How can I do this in python (pandas or otherwise)?


Solution

  • This is essentially reshaping your data from a wide format to a long format, as it's known in R parlance. In pandas, you can do this with pd.melt:

    pd.melt(df1, id_vars=['ID', 'Name'], var_name='Region', value_name='Group')
    #     ID  Name   Region Group
    # 0  101  Axel  AsiaPac   GrZ
    # 1  102   Bob  AsiaPac   GrF
    # 2  101  Axel   Europe   GrB
    # 3  102   Bob   Europe   GrD
    # 4  101  Axel       US   GrA
    # 5  102   Bob       US   GrC
    

    If you need your columns sorted on ID or Name and Group, as in your example output, you can add .sort_values() to the expression:

    pd.melt(df1, id_vars=['ID', 'Name'], var_name='Region', value_name='Group').sort_values(['ID', 'Group'])
    #     ID  Name   Region Group
    # 4  101  Axel       US   GrA
    # 2  101  Axel   Europe   GrB
    # 0  101  Axel  AsiaPac   GrZ
    # 5  102   Bob       US   GrC
    # 3  102   Bob   Europe   GrD
    # 1  102   Bob  AsiaPac   GrF