python pandas dataframe data-manipulation dataformat

Formatting data in python

I want to format the data into a dataframe or file with specific format. I have extracted the data from ADS Keysight. But the data structure is not useful for further processing. The data file has R(1,1) to R(2,2), L(1,1) to L(2,2), G(1,1) to G(2,2), C(1,1) to C(2,2) data for different frequencies. The file has only 2 columns 1st column is "Freq" and 2nd column is either one of the RLGC data. I have given an example here only foe 3 frequency for each of the data. And the target format as well.

The real file contains 3000 frequencies of data.

The data file is (The two columns are tab separated):

freq    R(1,1)
0.00E+00    5.11E+00
1.00E+06    5.59E+00
2.00E+06    5.69E+00


freq    R(1,2)
0.00E+00    7.13E-13
1.00E+06    9.82E-03
2.00E+06    2.47E-02


freq    R(2,1)
0.00E+00    7.13E-13
1.00E+06    9.82E-03
2.00E+06    2.47E-02


freq    R(2,2)
0.00E+00    5.11E+00
1.00E+06    5.59E+00
2.00E+06    5.69E+00


freq    L(1,1)
0.00E+00    6.61E-07
1.00E+06    6.97E-07
2.00E+06    7.06E-07


freq    L(1,2)
0.00E+00    4.46E-07
1.00E+06    4.47E-07
2.00E+06    4.48E-07


freq    L(2,1)
0.00E+00    4.46E-07
1.00E+06    4.47E-07
2.00E+06    4.48E-07


freq    L(2,2)
0.00E+00    6.61E-07
1.00E+06    6.97E-07
2.00E+06    7.06E-07


freq    G(1,1)
0.00E+00    1.04E-17
1.00E+06    6.42E-07
2.00E+06    1.29E-06


freq    G(1,2)
0.00E+00    -5.02E-18
1.00E+06    -3.11E-07
2.00E+06    -6.23E-07


freq    G(2,1)
0.00E+00    -5.02E-18
1.00E+06    -3.11E-07
2.00E+06    -6.23E-07


freq    G(2,2)
0.00E+00    1.04E-17
1.00E+06    6.42E-07
2.00E+06    1.29E-06


freq    C(1,1)
0.00E+00    5.58E-11
1.00E+06    5.54E-11
2.00E+06    5.53E-11


freq    C(1,2)
0.00E+00    -3.27E-11
1.00E+06    -3.25E-11
2.00E+06    -3.25E-11


freq    C(2,1)
0.00E+00    -3.27E-11
1.00E+06    -3.25E-11
2.00E+06    -3.25E-11


freq    C(2,2)
0.00E+00    5.58E-11
1.00E+06    5.54E-11
2.00E+06    5.53E-11

The target file should look like below (The columns are also tab separated):

freq    R(1,1)  R(1,2)  R(2,1)  R(2,2)  L(1,1)  L(1,2)  L(2,1)  L(2,2)  G(1,1)  G(1,2)  G(2,1)  G(2,2)  C(1,1)  C(1,2)  C(2,1)  C(2,2)
0.00E+00    5.11E+00    7.13E-13    7.13E-13    5.11E+00    6.61E-07    4.46E-07    4.46E-07    6.61E-07    1.04E-17    -5.02E-18   -5.02E-18   1.04E-17    5.58E-11    -3.27E-11   -3.27E-11   5.58E-11
1.00E+06    5.59E+00    9.82E-03    9.82E-03    5.59E+00    6.97E-07    4.47E-07    4.47E-07    6.97E-07    6.42E-07    -3.11E-07   -3.11E-07   6.42E-07    5.54E-11    -3.25E-11   -3.25E-11   5.54E-11
2.00E+06    5.69E+00    2.47E-02    2.47E-02    5.69E+00    7.06E-07    4.48E-07    4.48E-07    7.06E-07    1.29E-06    -6.23E-07   -6.23E-07   1.29E-06    5.53E-11    -3.25E-11   -3.25E-11   5.53E-11

The target format can be a dataframe in python which I can manipulate further. But Can anyone help me how to do it with pandas?

I am using Anaconda (Spyder) for python package (which includes python 3.6.4).

Solution

Use:

#create DataFrame from csv with columns f and v 
df = pd.read_csv(filename, sep="\s+", names=['freq','v'])

#boolean mask for identify columns of new df   
m = df['v'].str.endswith(')')
#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()
#get original ordering for new columns
cols = df['g'].unique()
#remove rows with same values in v and g columns
df = df[df['v'] != df['g']]
#reshape by pivoting with change ordering of columns by reindex
df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()
print (df)
       freq    R(1,1)    R(1,2)    R(2,1)    R(2,2)    L(1,1)    L(1,2)  \
0  0.00E+00  5.11E+00  7.13E-13  7.13E-13  5.11E+00  6.61E-07  4.46E-07   
1  1.00E+06  5.59E+00  9.82E-03  9.82E-03  5.59E+00  6.97E-07  4.47E-07   
2  2.00E+06  5.69E+00  2.47E-02  2.47E-02  5.69E+00  7.06E-07  4.48E-07   

     L(2,1)    L(2,2)    G(1,1)     G(1,2)     G(2,1)    G(2,2)    C(1,1)  \
0  4.46E-07  6.61E-07  1.04E-17  -5.02E-18  -5.02E-18  1.04E-17  5.58E-11   
1  4.47E-07  6.97E-07  6.42E-07  -3.11E-07  -3.11E-07  6.42E-07  5.54E-11   
2  4.48E-07  7.06E-07  1.29E-06  -6.23E-07  -6.23E-07  1.29E-06  5.53E-11   

      C(1,2)     C(2,1)    C(2,2)  
0  -3.27E-11  -3.27E-11  5.58E-11  
1  -3.25E-11  -3.25E-11  5.54E-11  
2  -3.25E-11  -3.25E-11  5.53E-11

Last for remove index values in output csv by to_csv use index=False parameter:

df.to_csv(file, index=False, sep='\t')