Search code examples
pythonpandasmulti-index

How to perform string operation on an entire Pandas MultiIndex?


I have a pandas dataframe with a two-level column index. It's read in from a spreadsheet where the author used a lot of whitespace to accomplish things like alignment (for example, one column is called 'Tank #').

I've been able to remove the whitespace on the levels individually...

level0 = raw.columns.levels[0].str.replace('\s', '', regex=True)
level1 = raw.columns.levels[1].str.replace('\s', '', regex=True)
raw.columns.set_levels([level0, level1], inplace=True)

...but I'm curious if there is a way to do it without having to change each individual level one at a time.

I tried raw.columns.set_levels(raw.columns.str.replace('\s', '', regex=True) but got AttributeError: Can only use .str accessor with Index, not MultiIndex.

Here is a small sample subset of the data-- my best attempt at SO table formatting :D, followed by a picture where I've highlighted in yellow the indices as received.

Run Info Run Info Run Data Run Data
run # Tank # Step A conc. %
ph
0 6931 5 5.29 33.14
1 6932 1 5.28 33.13
2 6933 2 5.32 33.40
3 6934 3 5.19 32.98

sample data

Thanks for any insight!

Edit: adding to_dict()

df.to_dict()
Out[5]: 
{'Unnamed: 0': {0: nan, 1: 0.0, 2: 1.0, 3: 2.0, 4: 3.0, 5: 4.0},
 'Run Info': {0: 'run #',
  1: '6931',
  2: '6932',
  3: '6933',
  4: '6934',
  5: '6935'},
 'Run Info.1': {0: 'Tank                             #',
  1: '5',
  2: '1',
  3: '2',
  4: '3',
  5: '4'},
 'Run Data': {0: 'Step A\npH',
  1: '5.29',
  2: '5.28',
  3: '5.32',
  4: '5.19',
  5: '5.28'},
 'Run Data.1': {0: 'concentration',
  1: '33.14',
  2: '33.13',
  3: '33.4',
  4: '32.98',
  5: '32.7'}}

Solution

  • How about rename:

    import re
    
    df.rename(columns=lambda x: re.sub('\s+', ' ', x.strip() ),inplace=True)
    

    If you don't want to keep any of the spaces, you can just replace ' ' with ''.