Here's my code so far:
import numpy as np
import pandas as pd
df = pd.read_excel(r'file.xlsx', index_col=0)
I want to rename the "Unnamed: *" columns to the last valid name.
Here's what I've tried and the results:
df.columns = df.columns.str.replace('Unnamed.*', method='ffill')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-253-c868b8bff7c7> in <module>()
----> 1 df.columns = df.columns.str.replace('Unnamed.*', method='ffill')
TypeError: replace() got an unexpected keyword argument 'method'
This "works" if I just do
df.columns = df.columns.str.replace('Unnamed.*', '')
But I then have either blank values or NaN (if I replace '' with 'NaN'. And then I try:
df.columns = df.columns.fillna('ffill')
Which has no effect. So I tried with inplace=True:
df.columns = df.columns.fillna('ffill', inplace=True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-279-cce486472d5b> in <module>()
----> 1 df.columns = df.columns.fillna('ffill', inplace=True)
TypeError: fillna() got an unexpected keyword argument 'inplace'
Then I tried a different way:
i = 0
while i < len(df.columns):
if df.columns[i] == 'NaN':
df.columns[i] = df.columns[i-1]
print(df.columns[i])
i += 1
Which gives me this Error:
Oil
158 RGN Mistura
Access West Winter Blend
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-246-bc8fa6881b1a> in <module>()
2 while i < len(df.columns):
3 if df.columns[i] == 'NaN':
----> 4 df.columns[i] = df.columns[i-1]
5 print(df.columns[i])
6 i += 1
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
2048
2049 def __setitem__(self, key, value):
-> 2050 raise TypeError("Index does not support mutable operations")
2051
2052 def __getitem__(self, key):
TypeError: Index does not support mutable operations
I did the following, i think it maintains the order you seek.
df = pd.read_excel('book1.xlsx')
print df
a b c Unnamed: 3 Unnamed: 4 d Unnamed: 6 e Unnamed: 8 f
0 34 13 73 nan nan 87 nan 76 nan 36
1 70 48 1 nan nan 88 nan 2 nan 77
2 37 62 28 nan nan 2 nan 53 nan 60
3 17 97 78 nan nan 69 nan 93 nan 48
4 65 19 96 nan nan 72 nan 4 nan 57
5 63 6 86 nan nan 14 nan 20 nan 51
6 10 67 54 nan nan 52 nan 48 nan 79
df.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in df.columns.values]).ffill().values.flatten()
print df
a b c c c d d e e f
0 34 13 73 nan nan 87 nan 76 nan 36
1 70 48 1 nan nan 88 nan 2 nan 77
2 37 62 28 nan nan 2 nan 53 nan 60
3 17 97 78 nan nan 69 nan 93 nan 48
4 65 19 96 nan nan 72 nan 4 nan 57
5 63 6 86 nan nan 14 nan 20 nan 51
6 10 67 54 nan nan 52 nan 48 nan 79