For the life of me I can't figure out how to remove the duplicate index items created when writing a multiindex dataframe to CSV. While there is this answer out there, it doesn't apply to me per se because my second level has all different values.
This is a chunk of the dataframe I have, it just goes on for each month
Month Measure
JAN KWH
Max KW
Day/Hour
Peak End Use
Peak Pct
FEB KWH
Max KW
Day/Hour
Peak End Use
Peak Pct
But in CSV it looks something like
JAN KWH
JAN Max KW
JAN Day/Hour
JAN Peak End Use
JAN Peak Pct
FEB KWH
FEB Max KW
FEB Day/Hour
FEB Peak End Use
FEB Peak Pct
I hope for it to look exactly just like the dataframe so that means in CSV it would look like
Jan,KWH
,Max KW
,Day/Hour
and so on
I have tried df.index.to_series().duplicates()
but it doesn't work because each of my index pairs are not duplicates. I've also tried using index.tolist()
then looping through to change with no luck.
What's the right approach here?
Thanks in advance!
It seems pandas does not provide directly such functionality, perhaps to ensure that the generated csv files can be read back, as mentioned in the comments above. I faced with the same problem when generating a table for report in rst format. Finally, I decided to process the generated csv by an external script. The script replaces the repeating values in columns with spaces, thus achieving desired result. Note: this script assumes commas as csv separators.
# Clean csv with multiindices
from sys import argv
# Positional command line arguments:
# 1: original csv file
# 2: number of columns to check (optional)
with open(argv[1], 'r') as f:
if len(argv) > 2:
n = int(argv[2])
i0 = ('', ) * n
else:
n = None
i0 = None
for l in f:
if n is None:
# define n from the 1-st line
i1 = l.split(',')
n = len(i1)
i0 = ('', ) * n
i1 = l.split(',', n)
nl = ()
for s0, s1 in zip(i0, i1):
if s0 == s1:
e = ' ' * len(s0)
else:
e = s1
nl += (e, )
nl += tuple(i1[n:])
print ','.join(nl),
i0 = i1[:n]