Search code examples
pythonpython-3.xpandasdataframemulti-value-dictionary

How to compare string value in dataframe column and value in cell to create new dataframe based on multi value dictionary?


I have a df that looks like this:

id.1.value.1        id.2.value.2      id.1.question    id.2.value.2
TRUE                     FALSE             TRUE             TRUE

I want to create logic that scans the column names of the df and extracts the last number only from column names that have value in column name and compare the value in the cell of the column that contains value with following logic:

  1. If value in column value equal TRUE then compare the last number in multi value dictionary

  2. use second value in multi key dictionary to create dataframe column names

Example:

my_dict = {1: ('a', 'category'),2: ('b', 'category'),\
           3: ('c', 'category'),4:('d','category'),\
           5:('e','subcategory'),6:('f','subcategory'),\
           7:('g','subcategory'),8:('h','subcategory'),\
           9:('i','subcategory'),10:('j','subcategory'),\
           11:('k','subcategor'),12:('l','subcategory'),\
           13:('m','subcategory'),14:('n','subcategory'),\
           15:('o','subcategory'),16:('p','subcategory'),\
           17:('q','subcategory'),18:('r','subcategory'),\
           19:('s','subcategory'),20:('t','subcategory'),\
           21:('u','subcategory'),22:('v','subcategory'),\
           23:('w','subcategory'),24:('x','subcategory')

           }

If my current df looks likes this:

id.1.value.1        id.2.value.2      id.1.question    id.6.value.6
    TRUE                 FALSE             TRUE             TRUE

New df should look like this:

category    subcategory
a               f

Solution

  • names = df.columns
    new_df = pd.DataFrame()
    for name in names:    
        if ('value' in name) & df[name][0]:
            last_number = int(name[-1])
            key, value = my_dict[last_number]
            try:
                new_df[value][0] = list(new_df[value][0]) + [key]
            except:
                new_df[value] = [key]