Search code examples
pythoncsvgpslora

Cleaning gps data csv file with python


I am working on real time LoRa gps plotter in python. I get gps position data from LoRa via serial and save them to a csv file. It is working, but I get the problem when i get corrupted gps data because of LoRa bad signal. Like so

45.830008,16.039911
45.830mdndn008,16.039911
45.830008,16.039oo9°1
45.830008,16.03991
45.830sj008,16.03#=991

What is the way to read just the numeric data from csv file in python? I don't want to read the corrupted data. I just want to read integers. I am using pandas to read the csv file. like so

data = pd.read_csv(self.data_path, names=['LATITUDE', 'LONGITUDE'], sep=',')


gps_data = tuple(zip(data['LATITUDE'].values, data['LONGITUDE'].values))

Solution

  • If you want keep the numeric values you can do to_numeric() combined with errors = 'coerce'. Then invalid values will be set as NaN, see here to_numeric

    import pandas as pd
    
    #read csv
    data = pd.read_csv(self.data_path, names=['LATITUDE', 'LONGITUDE'], sep=',')
    
    # copy dataframe structure only, no data
    gps_data = pd.DataFrame(columns=data.columns)
    
    # copy valid values only
    gps_data['LATITUDE'] = pd.to_numeric(data['LATITUDE'], errors='coerce')
    gps_data['LONGITUDE'] = pd.to_numeric(data['LONGITUDE'], errors='coerce')
    
    print (data)
    print (gps_data)
    
    # drop rows with NaNs in lat OR long
    gps_data = gps_data.dropna(subset=['LATITUDE', 'LONGITUDE']) # OR
    #gps_data = gps_data.dropna(subset=['LATITUDE', 'LONGITUDE'], thresh=2) # AND threshold = 2
    

    edit: you probably also want to drop the rows with empty NaN values