Follow-up from my last question: pyjanitor pivot_longer multiple sets of columns with common grouping variable and id column
In my last question, the dataset I gave was oversimplified for the problem I was having. I have changed the column names to represent the ones in my dataset, as I couldn't figure out how to fix them myself using regex in pivot_longer
. In the model dataset I gave, columns were written with the following pattern: number_word
, but in my dataset the columns are in any order and never separated by underscores (e.g., wordnumber).
Note that the number needs to be the same grouping variable for each column set. So there should be a rating, estimate, and type for each number.
df = pd.DataFrame({
'id': [1, 1, 1],
'ratingfirst': [1, 2, 3],
'ratingsecond': [2.8, 2.9, 2.2],
'ratingthird': [3.4, 3.8, 2.9],
'firstestimate': [1.2, 2.4, 2.8],
'secondestimate': [2.4, 3, 2.4],
'thirdestimate':[3.4, 3.8, 2.9],
'firsttype': ['red', 'green', 'blue'],
'secondtype': ['red', 'green', 'yellow'],
'thirdtype': ['red', 'red', 'blue'],
})
The header of my desired output is the following:
id | category | rating | estimate | type |
---|---|---|---|---|
1 | first | 1.0 | 1.2 | 'red' |
I think the easiest way would be to align the columns you have with what was used in the previous question, something like:
def fix_col_header(s, d):
for word, word_replace in d.items():
s = s.replace(word, word_replace)
if s.startswith("_"):
s = s[len(word_replace):] + s[:len(word_replace)]
return s
d = {"first":"_first", "second":"_second", "third": "_third"}
df.columns = [fix_col_header(col, d) for col in df.columns]
This will give the columns:
id, rating_first, rating_second, rating_third, estimate_first, estimate_second, estimate_third, type_first, type_second, type_third
Now you can apply the solution from the previous question (note that category and value are switched). For completeness I have added it here:
import janitor
(df
.pivot_longer(
column_names="*_*",
names_to = (".value", "category"),
names_sep="_")
)