Search code examples
pythonpandasloopspivot-table

Create multiple pivot tables assigned to dataframes after iterating


Sample data set:

  ID  1  2  3  X  Y  Z
0   1  2  1  2  3  3  4
1   2  1  3  1  4  3  4
2   3  2  2  1  2  4  3
3   4  3  2  1  2  3  3
4   5  1  2  2  1  3  2
5   6  2  3  2  4  4  2
​
cross1 = pd.crosstab(sample["1"], org1_df["X"])
cross2 = pd.crosstab(sample["2"], org1_df["X"])
cross3 = pd.crosstab(sample["3"], org1_df["X"])
cross4 = pd.crosstab(sample["1"], org1_df["Y"])
cross5 = pd.crosstab(sample["2"], org1_df["Y"])
cross6 = pd.crosstab(sample["3"], org1_df["Y"])
cross7 = pd.crosstab(sample["1"], org1_df["Z"])
etc.

I want to loop through this code replacing "Column 1" and "Column X" with new columns ("Column 2" and "Column Y") in order to produce a new crosstab and assign that crosstab to a new dataframe. It works once, manually, very easily. This provides a count of the answers to a Survey Question by category, in this case, Business Type.

1 = Large Business
2 = Small Business
3 = Non-profit 
    cross1 = pd.crosstab(sample["1"], sample["X"])
print(cross1)
X  1  2  3  4
1            
1  1  0  0  1
2  0  1  1  1
3  0  1  0  0

I need the iteration so I have multiple dataframes:

cross1 cross2 cross3 cross4 ... etc.

demo_questions = 
['1', '2', '3']

survey_questions = 
['X', 'Y', 'Z']

for d, s in [demo_questions, survey_questions]:
    cross[d] = pd.crosstab(sample[d], sample[s])

I tried the above but received the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[37], line 1
----> 1 for d, s in [demo_questions, survey_questions]:
      2     cross[d] = pd.crosstab(sample[d], sample[s])

ValueError: too many values to unpack (expected 2)


Solution

  • Create a dictionary to store the pivot tables, then iterate over the combinations of demo questions and survey questions and generate frequecy tables inside a dict comprehension

    cross = {
        f'{d}_{s}': 
            pd.crosstab(df[d], df[s])
        for d in demo_questions
        for s in survey_questions
    }
    

    Now you can access the results by indexing the dictionary

    print(cross['1_X'])
    
    X  1  2  3  4
    1            
    1  1  0  0  1
    2  0  1  1  1
    3  0  1  0  0