Search code examples
pythonpandaslatitude-longitude

python- Efficiently converting latitude from ddmm.ssss to degrees


I am converting a text file into netCDF format. I am reading the data from text file into a dataframe in which two of the columns are latitude_GPS and longitude_GPS. Input looks like:

latitude_GPS, longitude_GPS
7537.4536, 3558.4985
7672.1534, 3214.9532

They are measured in ddmm.ssss units which means if we have value like 7537.4536, here '75' is degrees, '37' is minutes and '4536' is seconds. I want to convert them into degree decimal except for the missing values which have a value of 999.0
My current code looks like this:

header_rows = 1

df = pd.read_csv(args.input_file, delim_whitespace=True, skiprows=header_rows, skip_blank_lines=True, names = column_names)

num_rows = sum(1 for line in open(args.input_file) if len(line.strip()) != 0) - header_rows

def lat_lon_gps(col_index):
    return ((int(col_index)/100) + round((int(col_index%100))/60, 4) + round(round(col_index%1, 4)/3600, 4))

check_na = 999.0

i = 0
while i < num_rows:
    if df['latitude_GPS'][i] != check_na:
        df['latitude_GPS'][i] = lat_lon_gps(df['latitude_GPS'][i])

    if df['longitude_GPS'][i] != check_na:
        df['longitude_GPS'][i] = lat_lon_gps(df['longitude_GPS'][i])

The return part calculates (75 + 37/60 + 4536/3600). The above code returns what I want but it takes around 50 minutes to run this part of code for a file having 10000 rows. Is there a faster way to do it. Any thoughts would be appreciated.


Solution

  • The problem is that you are iterating over every row. You should take advantage of vectorisation provided by pandas and numpy.

    For example:

    import numpy as np
    import pandas as pd
    
    df = pd.read_csv(args.input_file,
                     names=['latitude_GPS','longitude_GPS'],
                     skiprows=1)
    check_na = 999.0
    
    def lat_lon_gps(coords):
        deg = np.floor(coords / 100)
        minutes = np.floor(((coords / 100) - deg) * 100)
        seconds = (((coords / 100) - deg) * 100 - minutes) * 100
        return deg + minutes / 60 + seconds / 3600
    
    # Exclude NAs
    logic = df.latitude_GPS != check_na
    df = df[logic]
    
    df.latitude_GPS = lat_lon_gps(df.latitude_GPS)
    df.longitude_GPS = lat_lon_gps(df.longitude_GPS)