Search code examples
pythonpandasdataframe

Get boolean expression from hierarchical Pandas DataFrame


The dataframe is given as :

df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5, 6, 7, 8],
        "parent_id": [0, 0, 1, 1, 2, 2, 4, 4],
        "value": [a>2, b<4, d>5, e<3, h>1, i>10, f>3, g>2],
    }
)

I need get that string expression:

"or(and (a > 2, 
     or(d > 5, 
        and (e<3,
             or (f > 3, 
                 g > 2)
            )  
       )
    ),

    and(b < 4,
        or(h > 1,
           i > 10)
       )
  )"

I.e.all children should be as parameters of "OR" function , and parent and it's children should be as parameters of "AND" function


Solution

  • IIUC, you can use a recursive function like in your previous question:

    children = df.groupby('parent_id')['id'].agg(list).to_dict()
    values = dict(zip(df['id'], df['value']))
    
    def fold(c):
        if isinstance(c, list):
            return 'or(%s)' % ','.join(map(fold, c))
        if c in children:
            return f'and({values.get(c)},{fold(children.get(c))})'
        return values.get(c)
    
    out = fold(children[0]) # root
    

    Output:

    or(and(a>2,or(d>5,and(e<3,or(f>3,g>2)))),and(b<4,or(h>1,i>10)))
    

    Used input:

    df = pd.DataFrame(
        {
            'id': [1, 2, 3, 4, 5, 6, 7, 8],
            'parent_id': [0, 0, 1, 1, 2, 2, 4, 4],
            'value': ['a>2', 'b<4', 'd>5', 'e<3', 'h>1', 'i>10', 'f>3', 'g>2'],
        }
    )