I have a dataframe where the header row and first rows look something like this:
index | 1 | 2 |
---|---|---|
0 | string1 | string2 |
1 | int_val1 | int_val2 |
I would instead like for it to look like this:
index | string1_1 | string2_2 |
---|---|---|
0 | int_val1 | int_val2 |
I suspect that there may be a more pandas-oriented way to go about this, but I cannot for the life of me find one. So, I'm trying to treat the two concerned rows as lists and using the zip()
function to create a new list and using that as the header row. To accomplish this, I first try to convert the header row to integers, like so:
df = df.columns.astype(str)
new_headers = [x + y for x, y in zip(df.loc[0], df.columns)]
However, that returns AttributeError: 'Index' object has no attribute 'loc'
, and it seems to be because I converted the original headers to strings, for reasons I don't fully understand.
Is there a better way to accomplish what I'm trying to do? Or is there something that I'm missing with the zip? For the sake of completing this task I'm going to manually rename the columns and then delete the extraneous row, but I'd like to know if there's a more Pythonic/Pandan way of doing this.
Edit: index
is the index!
Your error is due to df = df.columns.astype(str)
which reassigns the index to df
.
Assuming index
is the index, you could use:
# update the columns
df.columns = list(map('_'.join, zip(df.iloc[0], df.columns.astype(str))))
# or
# df.columns = [x + y for x, y in zip(df.loc[0], df.columns.astype(str))]
# delete first row
df.drop(0, inplace=True)
# decrement index
df.index -= 1
print(df)
Output:
string1_1 string2_2
0 int_val1 int_val2
Used input:
df = pd.DataFrame({1: ['string1', 'int_val1'],
2: ['string2', 'int_val2']})