Search code examples
pythonpandasgeolocationcoordinatescoordinate-transformation

Converting pandas data frame with degree minute second (DMS) coordinates to decimal degrees


I have a data frame like the below and would like to convert the Latitude and Longitude columns in Degree, Minute, Second format into decimal degrees - with negative for the correct hemisphere. Is there an easy way to do that?

Parent Company  CPO PKO Latitude    Longitude
Incasi Raya X       0°51'56.29"S    101°26'46.29"E
Incasi Raya X       1°23'39.29"S    101°35'30.45"E
Incasi Raya X       0°19'56.63"N    99°22'56.36"E
Incasi Raya X       0°21'45.91"N    99°37'59.68"E
Incasi Raya X       1°41'6.56"S 102°14'7.68"E
Incasi Raya X       1°15'2.13"S 101°34'30.38"E
Incasi Raya X       2°19'44.26"S    100°59'34.55"E
Musim Mas   X       1°44'55.94"N    101°22'15.94"E

For example 0°51'56.29"S would be converted to -0.8656361


Solution

  • Basing my answer on a function from SO you can do it like this:

    Interestingly this answer is also 2x as fast as MaxU and Amis answer for a dataset with +500 rows. My bet is that the bottleneck is str.extract(). But something is clearly strange.

    import pandas as pd
    import re
    
    #https://stackoverflow.com/questions/33997361
    def dms2dd(s):
        # example: s = """0°51'56.29"S"""
        degrees, minutes, seconds, direction = re.split('[°\'"]+', s)
        dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60);
        if direction in ('S','W'):
            dd*= -1
        return dd
    
    df = pd.DataFrame({'CPO': {0: 'Raya', 1: 'Raya'},
     'Latitude': {0: '0°51\'56.29"S', 1: '1°23\'39.29"S'},
     'Longitude': {0: '101°26\'46.29"E', 1: '101°35\'30.45"E'},
     'PKO': {0: 'X', 1: 'X'},
     'ParentCompany': {0: 'Incasi', 1: 'Incasi'}})
    
    df['Latitude'] = df['Latitude'].apply(dms2dd)
    df['Longitude'] = df['Longitude'].apply(dms2dd)
    

    printing df returns:

        CPO   Latitude   Longitude PKO ParentCompany
    0  Raya  -0.865636  101.446192   X        Incasi
    1  Raya  -1.394247  101.591792   X        Incasi
    

    Update: To correct your mistake you could do something in the lines of:

    m = df['Latitude'].str[-2] != '"'
    df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]
    

    Full example:

    import re
    
    s1 = """0°51'56.29"S"""
    s2 = """0°51'56.29S"""
    
    df = pd.Series((s1,s2)).to_frame(name='Latitude')
    
    m = df['Latitude'].str[-2] != '"'
    df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]
    
    print(df)