I have a dataframe like this (but with more columns and rows):
pd.DataFrame({
"Measure" : ["twitter", "news"],
"Upper quartile" : [0.1, 0.2],
"Lower quartile" : [0.05, 0.1],
"P-value1" : [0.05, 0.02],
"Upper decile" : [0.11, 0.22],
"Lower decile" : [0.03, 0.09],
"P-value2" : [0.04, 0.01]
})
And I need to kind of merge columns "Upper quartile", "Lower quartile" and "P-value1" into one column and "Upper decile", "Lower decile" and "P-value2" to another. In the end I would like to have a table like this (using sort of latex notation):
Measure & Upper quartile & Upper decile \\
{} & Lower quartile & Lower decile \\
{} & P-value1 & P-value2 \\
twitter & 0.1 & 0.11 \\
{} & 0.05 & 0.03 \\
{} & 0.05 & 0.04 \\
news & 0.2 & 0.22 \\
{} & 0.1 & 0.09 \\
{} & 0.02 & 0.01 \\
{} denotes empty cells
\\ denotes end of line
& is column separator
This part should be my header:
Measure & Upper quartile & Upper decile \\
{} & Lower quartile & Lower decile \\
{} & P-value1 & P-value2 \\
I have tried to set Multiindex, but it is not exactly what I need cause each level of multiindex relates to every value in given column, but this is not case of my target table. In my table, every row contains 3 values and only one of three headers relates to each of the values. I have also tried groupby and melt, but did not figure it out.
The reason for doing this: my table (df) is too wide when exported to overleaf, so I am trying to make it narrower so it fits on one page. I could do the table I need manually in overleaf, but I have quite a few tables like this, so I prefer preparing it in pandas and then exporting to latex.
Using a MultiIndex and reshaping, assuming your measurements are sorted and grouped in the desired order (e.g. Upper/Lower/P-Value)
# number of measurements per final column
N = 3
tmp = df.set_index('Measure')
M = tmp.shape[1]//N
cols = pd.MultiIndex.from_arrays(tmp.columns.to_numpy().reshape(-1, N).T,
names=['Measure']+['{}']*(N-1))
idx = (tmp.index.repeat(N).to_series()
.mask(lambda x: x.duplicated(), '{}')
)
def pad(s):
s = s.astype(str)
return s.str.ljust(s.str.len().max())
out = '\n'.join(pd.concat([
cols.to_frame(allow_duplicates=True, index=False).T,
pd.DataFrame(tmp.to_numpy().reshape(len(df), -1, N).swapaxes(1,2).reshape(-1, M),
index=idx),
])
.reset_index().astype(str)
.apply(pad) # optional
.agg(' & '.join, axis=1).add(r' \\')
)
print(out)
Output:
Measure & Upper quartile & Upper decile \\
{} & Lower quartile & Lower decile \\
{} & P-value1 & P-value2 \\
twitter & 0.1 & 0.11 \\
{} & 0.05 & 0.03 \\
{} & 0.05 & 0.04 \\
news & 0.2 & 0.22 \\
{} & 0.1 & 0.09 \\
{} & 0.02 & 0.01 \\
Output with optional padding:
Measure & Upper quartile & Upper decile \\
{} & Lower quartile & Lower decile \\
{} & P-value1 & P-value2 \\
twitter & 0.1 & 0.11 \\
{} & 0.05 & 0.03 \\
{} & 0.05 & 0.04 \\
news & 0.2 & 0.22 \\
{} & 0.1 & 0.09 \\
{} & 0.02 & 0.01 \\