Search code examples
pythonpandasdataframereplacefillna

Pandas: Renaming "Unnamed: *" or "NaN" in data frame


Here's my code so far:

import numpy as np
import pandas as pd
df = pd.read_excel(r'file.xlsx', index_col=0)

Here's what it looks like: enter image description here

I want to rename the "Unnamed: *" columns to the last valid name.

Here's what I've tried and the results:

df.columns = df.columns.str.replace('Unnamed.*', method='ffill')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-253-c868b8bff7c7> in <module>()
----> 1 df.columns = df.columns.str.replace('Unnamed.*', method='ffill')

TypeError: replace() got an unexpected keyword argument 'method'

This "works" if I just do

df.columns = df.columns.str.replace('Unnamed.*', '')

But I then have either blank values or NaN (if I replace '' with 'NaN'. And then I try:

df.columns = df.columns.fillna('ffill')

Which has no effect. So I tried with inplace=True:

df.columns = df.columns.fillna('ffill', inplace=True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-279-cce486472d5b> in <module>()
----> 1 df.columns = df.columns.fillna('ffill', inplace=True)

TypeError: fillna() got an unexpected keyword argument 'inplace'

Then I tried a different way:

i = 0
while i < len(df.columns):
    if df.columns[i] == 'NaN':
        df.columns[i] = df.columns[i-1]
    print(df.columns[i])
    i += 1

Which gives me this Error:

Oil
158 RGN Mistura
Access West Winter Blend 

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-246-bc8fa6881b1a> in <module>()
      2 while i < len(df.columns):
      3     if df.columns[i] == 'NaN':
----> 4         df.columns[i] = df.columns[i-1]
      5     print(df.columns[i])
      6     i += 1

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   2048 
   2049     def __setitem__(self, key, value):
-> 2050         raise TypeError("Index does not support mutable operations")
   2051 
   2052     def __getitem__(self, key):

TypeError: Index does not support mutable operations

Solution

  • I did the following, i think it maintains the order you seek.

    df = pd.read_excel('book1.xlsx')
    print df
    
    
        a   b   c  Unnamed: 3  Unnamed: 4   d  Unnamed: 6   e  Unnamed: 8   f
    0  34  13  73         nan         nan  87         nan  76         nan  36
    1  70  48   1         nan         nan  88         nan   2         nan  77
    2  37  62  28         nan         nan   2         nan  53         nan  60
    3  17  97  78         nan         nan  69         nan  93         nan  48
    4  65  19  96         nan         nan  72         nan   4         nan  57
    5  63   6  86         nan         nan  14         nan  20         nan  51
    6  10  67  54         nan         nan  52         nan  48         nan  79
    
    
    df.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in df.columns.values]).ffill().values.flatten()
    print df
    
    
        a   b   c   c   c   d   d   e   e   f
    0  34  13  73 nan nan  87 nan  76 nan  36
    1  70  48   1 nan nan  88 nan   2 nan  77
    2  37  62  28 nan nan   2 nan  53 nan  60
    3  17  97  78 nan nan  69 nan  93 nan  48
    4  65  19  96 nan nan  72 nan   4 nan  57
    5  63   6  86 nan nan  14 nan  20 nan  51
    6  10  67  54 nan nan  52 nan  48 nan  79