Search code examples
pythonpandasdataframedata-manipulationdataformat

Changing only header in text file before saving it in a new file in Python


I want to change the header line only of my data file before saving it into a new file after formatting. My data is full of floating data with exponential. I am using Spyder 3.2.6 where python 3.6.4 64-bit is embedded.

here is my data file link. It is truncated as each of the R, L, G, C field there are 3001 rows which end up to have total 480062 rows in total in the original data file.

My data formatting code is below:

import pandas as pd

#create DataFrame from csv with columns f and v 
df = pd.read_csv('data.txt', sep="\s+", names=['freq','v'])

#boolean mask for identify columns of new df   
m = df['v'].str.endswith(')')
#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()
#get original ordering for new columns
cols = df['g'].unique()
#remove rows with same values in v and g columns
df = df[df['v'] != df['g']]
#reshape by pivoting with change ordering of columns by reindex
df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()


df.to_csv('target.txt', index=False, sep='\t')

Now target file is saved as "target.txt" where the header line is like below:

freq    R(1,1)  R(1,2)  R(2,1)  R(2,2)  L(1,1)  L(1,2)  L(2,1)  L(2,2)  G(1,1)  G(1,2)  G(2,1)  G(2,2)  C(2,2)  C(1,1)  C(1,2)  C(2,1)

Here you can see each column is separated by a "tab". The header line is like that because it gets the strings or data from the input file and the input files data strings are like that.

Now I want my header line to be a bit different as it is in the below for further processing. Can you help me how to change it to below header line before it saves my data into the new file "target.txt"?

Freq    R1:1    R1:2    R2:1    R2:2    L1:1    L1:2    L2:1    L2:2    G1:1    G1:2    G2:1    G2:2    C1:1    C1:2    C2:1    C2:2

Solution

  • For this specific case, you could rename your headers like this

    df.columns = [x.replace('(','').replace(')','').replace(',',':') for x in df.columns]
    

    before saving the dataframe to csv.