Search code examples
pythonpandasfrozenset

Extract Frozenset items from Pandas Dataframe


I have the following dataframe:

enter image description here

And I would like to convert the columns "antecedents" and "consequents" to string, removing the "frozenset({ ... })" format and thus have, for all the rows:

"VENTOLIN S.INAL200D 100MCG", instead of frozenset({ "VENTOLIN S.INAL200D 100MCG" }).

I managed to achieve the result with:

prod = []

for i in df["antecedents"]:
    prod.append(str(i))

new_set = {x.replace('frozenset', ''
                     ).replace('})', ''
                        ).replace('({', ''
                        ).replace("'", "") for x in prod}

Is there a more pythonic solution?


Solution

  • First convert values to tuples or lists and then use DataFrame.explode:

    df = pd.DataFrame({
             'antecedents':[frozenset({'aaa', 'bbb'})] * 3 + [frozenset({'nbb'})] * 3,
             'consequents':[frozenset({'ccc'})] * 3 + [frozenset({'nbb', 'ddd'})] * 3,
             'C':[1,3,5,7,1,0],
    })
    #print (df)
    
    cols = ['antecedents','consequents']
    df[cols] = df[cols].applymap(lambda x: tuple(x))
    print (df)
      antecedents consequents  C
    0  (bbb, aaa)      (ccc,)  1
    1  (bbb, aaa)      (ccc,)  3
    2  (bbb, aaa)      (ccc,)  5
    3      (nbb,)  (nbb, ddd)  7
    4      (nbb,)  (nbb, ddd)  1
    5      (nbb,)  (nbb, ddd)  0
    

    df1 = (df.explode('antecedents')
             .reset_index(drop=True)
             .explode('consequents')
             .reset_index(drop=True))
    print (df1)
       antecedents consequents  C
    0          bbb         ccc  1
    1          aaa         ccc  1
    2          bbb         ccc  3
    3          aaa         ccc  3
    4          bbb         ccc  5
    5          aaa         ccc  5
    6          nbb         nbb  7
    7          nbb         ddd  7
    8          nbb         nbb  1
    9          nbb         ddd  1
    10         nbb         nbb  0
    11         nbb         ddd  0