Search code examples
pandasmeltpyjanitor

pyjanitor pivot_longer multiple sets of columns with common grouping variable and id column


Follow-up from my last question: regex column selection inside pd.melt function

I used the following code to pivot_longer columns that match the string "rating".

import pandas as pd
import janitor
(df
.pivot_longer(
    column_names = "*rating",
    names_to = ("category", ".value"))
)
    id category  rating
0    1    first     1.
1    1    first     2.0
2    1    first     3.0
3    2    first     1.0
4    2    first     2.0
5    2    first     3.0
6    3    first     1.0
7    3    first     2.0
8    3    first     3.0
9    1   second     2.8
10   1   second     2.9
11   1   second     2.2
12   2   second     2.0
13   2   second     1.8
14   2   second     1.9
15   3   second     2.2
16   3   second     2.3
17   3   second     2.1
18   1    third     3.4
19   1    third     3.8
20   1    third     2.9
21   2    third     3.2
22   2    third     2.8
23   2    third     2.4
24   3    third     3.3
25   3    third     3.4
26   3    third     2.9

Now, I also have columns that match the string "type" and "estimate" that have the prefix "first", "second", and "third".

df = pd.DataFrame({

    'id': [1, 1, 1],
    'first_rating': [1, 2, 3],
    'second_rating': [2.8, 2.9, 2.2],
    'third_rating': [3.4, 3.8, 2.9],
    'first_estimate': [1.2, 2.4, 2.8],
    'second_estimate': [2.4, 3, 2.4],
    'third_estimate':[3.4, 3.8, 2.9],
    'first_type': ['red', 'green', 'blue'],
    'second_type': ['red', 'green', 'yellow'],
    'third_type': ['red', 'red', 'blue'],
})

The header of my desired output is the following:

id category rating estimate type
1 first 1.0 1.2 'red'

I want to pivot longer the type and estimate columns too, while having only one 'id' and 'category' column. I know I could pivot longer for ratings, estimates, and types separately and then join the dfs, but I imagine there is a more efficient solution.

I tried to the following but it doesn't work:

(df
.pivot_longer(
    column_names = "*rating", 
    names_to = "category",
    values_to = "rating")
.pivot_longer(
    column_names = "*type", 
    names_to = "category",
    values_to = "type") 
.pivot_longer(
    column_names = "*estimate", 
    names_to = "category",  
    values_to = "estimate"))

Solution

  • The same concept as the previous solution applies, you just need to adapt the column names selection based on a regex or a shell glob (under the hood it uses fnmatch):

    # pip install pyjanitor
    import pandas as pd
    import janitor
    # import re - if you are using regex to select the column names
    (df
    .pivot_longer(
        # you can use a regex here instead
        # re.compile(".+_.+")
        column_names="*_*", 
        names_to = ("category", ".value"), 
        names_sep="_")
    )
    
       id category  rating  estimate    type
    0   1    first     1.0       1.2     red
    1   1    first     2.0       2.4   green
    2   1    first     3.0       2.8    blue
    3   1   second     2.8       2.4     red
    4   1   second     2.9       3.0   green
    5   1   second     2.2       2.4  yellow
    6   1    third     3.4       3.4     red
    7   1    third     3.8       3.8     red
    8   1    third     2.9       2.9    blue