Search code examples
pythonpandaspandas-explode

How to explode Python Pandas Dataframe based on string and criteria


How to turn StringDataFrame:

String
Jon likes {ExplodeAnimals}.
Jon eats {ExplodeFruit}.

Into this:

String
Jon likes Cats.
Jon likes Dogs.
Jon likes Tigers.
Jon likes Llamas.
Jon eats Apples.
Jon eats Pears.
Jon eats Bananas.
Jon eats Strawberries.

Based on this ThingsDataFrame

Thing Type
Cats animal
Dogs animal
Tigers animal
Llamas animal
Apples fruit
Pears fruit
Bananas fruit
Strawberries fruit

Solution

  • option 1

    You can use merge/map.

    # you could skip this mapping if you used "Jon likes {animal}."
    mapper = {'ExplodeAnimals': 'animal', 'ExplodeFruit': 'fruit'}
    
    out = (StringDataFrame['String']
      .str.extract(r'(?P<String>.*) {(?P<Type>.*)}')
      .assign(Type=lambda d: d['Type'].map(mapper))
      .merge(ThingsDataFrame, on='Type')
      .assign(String=lambda d: d['String']+' '+d['Thing'])
      [['String']]
    )
    
    print(out)
    

    Output:

                      String
    0         Jon likes Cats
    1         Jon likes Dogs
    2       Jon likes Tigers
    3       Jon likes Llamas
    4        Jon eats Apples
    5         Jon eats Pears
    6       Jon eats Bananas
    7  Jon eats Strawberries
    

    option 2

    probably less efficient but more versatile, using the curly bracket notation to perform brace expansion (with the braceexpand module):

    # pip install braceexpand
    from braceexpand import braceexpand
    
    mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)
    
    (StringDataFrame['String']
     .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
     .apply(lambda x: list(braceexpand(x)))
     .explode()
    )
    

    NB. simplifying the StringDataFrame input to:

                    String
    0  Jon likes {animal}.
    1    Jon eats {fruit}.
    

    Output:

    0           Jon likes Cats.
    0           Jon likes Dogs.
    0         Jon likes Tigers.
    0         Jon likes Llamas.
    1          Jon eats Apples.
    1           Jon eats Pears.
    1         Jon eats Bananas.
    1    Jon eats Strawberries.
    Name: String, dtype: object
    

    This enables you to do funky stuff like:

    print(StringDataFrame)
    #                                  String
    # 0  Jon likes {animal} that eat {fruit}.
    
    print(ThingsDataFrame)
    #     Thing    Type
    # 0    Cats  animal
    # 1    Dogs  animal
    # 2  Apples   fruit
    # 3   Pears   fruit
    
    mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)
    
    (StringDataFrame['String']
     .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
     .apply(lambda x: list(braceexpand(x)))
     .explode()
    )
    
    # 0    Jon likes Cats that eat Apples.
    # 0     Jon likes Cats that eat Pears.
    # 0    Jon likes Dogs that eat Apples.
    # 0     Jon likes Dogs that eat Pears.
    # Name: String, dtype: object