dataframeapache-sparkpysparkpivottranspose# Pyspark Transpose multiple rows in multiple columns

I want to transpose multiple rows of a datafarme in multiple columns, as shown below:

Initial df:

Code | Cat | Count | Value |
---|---|---|---|

S1 | A | 1 | 10 |

S1 | B | 2 | 15 |

S2 | A | 3 | 20 |

S2 | B | 4 | 25 |

Final df:

Code | Count A | Value A | Count B | Value B |
---|---|---|---|---|

S1 | 1 | 10 | 2 | 15 |

S2 | 3 | 20 | 4 | 25 |

Trying the pivot function I can transpose only one between Count and Value

`df = df.groupBy("Code").pivot("Cat").agg(f.first("Count"))`

Code | A | B |
---|---|---|

S1 | 1 | 2 |

S2 | 3 | 4 |

Making two intermediate dataframes and then join them could be the only way to do it?

Solution

There's no need to join, you can do multiple aggregations in one go

```
result = df.groupBy('Code').pivot('Cat').agg(F.first('Count').alias('Count'),
F.first('Value').alias('Value'))
```

```
+----+-------+-------+-------+-------+
|Code|A_Count|A_Value|B_Count|B_Value|
+----+-------+-------+-------+-------+
| S2| 3| 20| 4| 25|
| S1| 1| 10| 2| 15|
+----+-------+-------+-------+-------+
```

- Pandas: Filter rows on value in dictionaries in Series
- How to summarize values depending on category of other variable in R?
- Given a value from a pandas column DataFrame, select N rows above and below to that closest value in other DataFrame
- R pivot dataframe to reshape
- Sample random rows in dataframe
- Pythonic way of selecting specific region of datatframe based off sign and position
- Arrange column order in R data frame inside a list
- Summing group of data in Pandas row by row
- If dataframe contents are not unique; subset, combine & rename
- Pandas modified rolling average
- Loop through dataframe column names -
- How to convert Latin characters to lowercase in data frame in Python?
- Map a value based on two conditions in two dictionaries
- Is it possible in a multiindexed Pandas dataframe to have a column whose values refer to a higher level index?
- Replicating a dataframe as a whole n times
- AttributeError: 'Styler' object has no attribute 'style'
- How do I check if a pandas DataFrame is empty?
- How to keep a numeric sequence in a pandas DataFrame column that only increase?
- What is dtype('O'), in pandas?
- How to increase process speed using read_excel in pandas?
- Finding the index of values of one dataframe in another
- How to mergr/concat two dataframe with different column length?
- Filter previous data in grouped dataframe using dplyr
- Pandas' merge returns a column with _x appended to the name
- sum time for dtype: object in df
- Removing empty rows from dataframe
- Pandas sum multiple dataframes
- Remove empty rows and empty [ ] using Python
- Python Script to Pyspark Script
- pandas multi-index divide aggregated counts