Search code examples
pythonpandastextsplit

How to split the data two columns and rename it


Good day, I have some questions and would like some suggestions about the code I wrote. text file I use

Code

import pandas as pd
import numpy as np
filename = '100m.txt'
#read text file into pandas DataFrame 
df = pd.read_csv(filename, sep="\t")
#drop the unwanted rows
df.drop(df.index[0:2], inplace=True)  

#display DataFrame
print(df)

which the output I obtain before I drop the column:

#"Frequency / MHz"                                  S2,1 [SPara1] [Magnitude in dB]
#----------------------------------------------...                              NaN
0.13800000000000                                                  -0.86382723874573
0.14231250000000                                                  -0.87087313013279
0.14662500000000                                                  -0.87829259771590
...                                                                             ...
17.646750000000                                                    -7.4030582653105
17.651062500000                                                    -7.4207444253551
17.655375000000                                                    -7.4408195390888
17.659687500000                                                    -7.4589436977625
17.664000000000                                                    -7.4799578201591

[4067 rows x 1 columns]


After I drop the column but somehow, there '# was present in the data and all the data was in one column not two like I expected.

                                 #
0.13800000000000  -0.86382723874573
0.14231250000000  -0.87087313013279
0.14662500000000  -0.87829259771590
0.15093750000000  -0.88573009666901
0.15525000000000  -0.89247663245258
    
[4065 rows x 1 columns]

​How to make the data split into two rows and renamed the column? Thank you in advance.


Solution

  • You should handle this when parsing in the DataFrame. Notice how it says [4067 rows x 1 columns] when you print the DataFrame before attempting to drop? The inclusion of these three rows when loading the CSV breaks the ability for it to get split properly with your specified '\t' seperator.

    Example code:

    df = pd.read_csv(
        filename,
        sep="\t",
        skiprows=3,
        names=['frequency', 'magnitude'],
    )
    

    You can also parse in your own column names as shown in the example above, as we are completely disregarding those rows when parsing the data.

    Resulting in:

    enter image description here