Suppose I have such a dataframe in Pandas:
df = pd.DataFrame({'a':[4,4,8,8],'b':[4,5,6,5], 'd':[0,1,2,1]})
multi_idx = pd.MultiIndex.from_arrays([[0,0,1,1],[0,1,0,1]])
df.index= multi_idx
which outputs this shape:
a b d
0 0 4 4 0
1 4 5 1
1 0 8 6 2
1 8 5 1
You see that values of columns a
are repeated based on the first level index. I am looking for a way to avoid this repetition. One of course would be to split the information across more dataframes, i.e., having a data frame for column a
and first level index. However, I was wondering if there is somehow a way using multiindex and multi level columns, to create a columns whose values correspond to all rows of a higher level index.
Visually I would like something like that: is that possible in Pandas?
| | | a | b | c |
|----|----|-----|-----|-----|
|idx1|idx2| | | |
|----|----|-----|-----|-----|
| | 0 | | 4 | 0 |
| 0 |----| 4 |-----|-----|
| | 1 | | 5 | 1 |
|----|----|-----|-----|-----|
| | 0 | | 6 | 2 |
| 1 |----| 8 |-----|-----|
| | 1 | | 5 | 1 |
|----|----|-----|-----|-----|
This is unfortunately not possible, pandas doesn't have a concept of merged cells.
If you want to "merge" contiguous identical cells, you either have to use a MultiIndex to benefit from this display (only on the initial levels though) or to post-process the html output to change the rowspan of those cells manually.
Example to process the HTML, partially with pandas to compute the size of the rowspan and with BeautifulSoup
to modify the HTML:
from IPython.core.display import HTML
from IPython.display import display
from bs4 import BeautifulSoup
import re
def to_rowspan(s):
m = s.ne(s.groupby(level=0).shift())
sz = s.groupby(m.cumsum()).transform('size')
return [f'[rowspan={size}]{val}' if mask else 'TO_DELETE'
for size, mask, val in zip(sz, m, s)]
def merge_cells(df):
reg = re.compile(r'\[rowspan=(\d+)\](.*)')
soup = BeautifulSoup(df.apply(to_rowspan).to_html())
for row in soup.find_all('td'):
if row.text == 'TO_DELETE':
row.decompose()
elif (m:=reg.search(row.text)):
a,b = m.groups() ; a = int(a)
if a>1:
row['rowspan'] = a
row.string = b
return soup.prettify()
display(HTML(merge_cells(df)))
Output: