Search code examples
pandasdataframecsvnantxt

Pandas - Passing working values from read_csv to a DataFrame turns everything to NaN, why?


I'm working on a script which reads a bunch of txt files into pandas for data processing. A sample of a text file would be:

0000000000000e+000,0.05844309,0.05078511

5000000000000e-001,0.05802771,0.01336614

0000000000000e-001,0.1123048,0.008524402

5000000000000e-001,0.1359783,0.005294179

0000000000000e+000,0.1028109,0.004224583

2500000000000e+000,0.1182408,0.005825941

(Without the gaps in lines)

my code is:

os.chdir(ProcessedDataPath) #Path to the overall folder
PandasFilePath = 'Run_Data00001.txt' #this being the data file I'm reading
Data_RAW = pd.read_csv(PandasFilePath, header = None)
Data_RAW = Data_RAW.astype(float)
Data_Frame = pd.DataFrame(Data_RAW,columns=["Hz", "N", "m/s2"])

It doesn't throw up any errors, the column names are correct, but, all the values in "Data_Frame" are NaN despite all the values read into Data_Raw being correct. I'm seeing the data types etc from the Anaconda Variable Explorer

I've tried removing NaN values or changing the data types, but nothing seems to read into Data_Frame properly.


Solution

  • Try to create pass numpy array instead dataframe to pd.DataFrame constuctor:

    Data_Frame = pd.DataFrame(Data_RAW.values, columns=["Hz", "N", "m/s2"])
    

    This prints:

                 Hz         N      m/s2
    0  0.000000e+00  0.058443  0.050785
    1  5.000000e+11  0.058028  0.013366
    2  0.000000e+00  0.112305  0.008524
    3  5.000000e+11  0.135978  0.005294
    4  0.000000e+00  0.102811  0.004225
    5  2.500000e+12  0.118241  0.005826
    

    I recommend just setting the .column property of Data_RAW:

    Data_RAW.columns =  ["Hz", "N", "m/s2"]
    print(Data_RAW)