Search code examples
python-3.xpandasdataframesummarize

Summarizing dataframe string values to count in Python 3


in the screenshot below you'll find a dataframe that contains string values in each cell. What i would like to do is to create a new dataframe out of this one that contains 3 columns: 'Very interested' 'Somewhat interested', and 'Not interested'. I don't know how to transform the original df into this new one, i tried just counting the values that meets a condition like 'Very interested' and putting them into a new df but the numbers don't seem right.

i would appreciate any help here. Thank you.

Df to transform EDIT: here is also the code to reproduce a dataframe similar to the one in the screenshot:

df = pd.DataFrame({1: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 2: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 3: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 4: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 5: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 6: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested']}, 
                 index=['Big Data','Data Analysis','Data Journalism', 'Data Visualization', 'Deep Learning', 'Machine Learning'])

As per the desired output, it should be something like this:

Desired output


Solution

  • I think need reshape by melt and then get counts by GroupBy.size with Series.unstack:

    df = (df.rename_axis('val')
            .reset_index()
            .melt('val', var_name='a', value_name='b')
            .groupby(['val','b'])
            .size()
            .unstack(fill_value=0))
    

    Another solution withstack, counts by SeriesGroupBy.value_counts with Series.unstack:

    df = (df.stack()
            .groupby(level=0)
            .value_counts()
            .unstack(fill_value=0))