Search code examples
pythonpandaslatitude-longitude

How to make a dataframe into certain rannge


Here is my dataset

Address                                                 Latitude        Longitude
Bandar Udara Sultan Aji Muhammad Sulaiman               -12.658.151     1.168.977.258
Bandar Udara Halim Perdanakusuma                        -62.652.088     1.068.863.459
Bandar Udara Internasional Sultan Mahmud Badaruddin II  -28.947.992     1.047.046.471
Bandar Udara Internasional Zainuddin Abdul Madjid        -8.761.939     1.162.735.566
Bandar Udara Internasional Sultan Syarif Kasim II         4.645.874     1.014.477.217

The value of Latitude is between -10 to 10, and Longitude is 100 to 200, so we divide them to make them standard longitude and latitude values.

Here's my expected dataset

Address                                                 Latitude        Longitude
Bandar Udara Sultan Aji Muhammad Sulaiman                -1.2658151     116.8977258
Bandar Udara Halim Perdanakusuma                         -6.2652088     106.8863459
Bandar Udara Internasional Sultan Mahmud Badaruddin II   -2.8947992     104.7046471
Bandar Udara Internasional Zainuddin Abdul Madjid        -8.761939      116.2735566
Bandar Udara Internasional Sultan Syarif Kasim II         4.645874      101.4477217

Solution

  • This is a very custom solution which considers the fact that your coordinates have certain ranges: more specifically, longitude will always have three digits before the decimals, and latitude will have always one (the minus sign is optional).

    Basically, you first remove all the dots, then you insert a dot after 3 or 1 digit for the longitude and latitude, respectively.

    import numpy as np
    import pandas as pd
    
    # Longitude
    df['Longitude'] = df['Longitude'].str.replace('\.', '', regex=True)
    df['Longitude'] = (df['Longitude'].str[:3] + '.' + df['Longitude'].str[3:]).astype(float)
    
    # Latitude
    negative_lat = df['Latitude'].str.startswith('-')
    df['Latitude'] = df['Latitude'].str.replace('\.|-', '', regex=True)
    df['Latitude'] = (df['Latitude'].str[:1] + '.' + df['Latitude'].str[1:]).astype(float)
    df['Latitude'] = np.where(negative_lat, -1 * df['Latitude'], df['Latitude'])
    
    print(df.dtypes)
    print(df)
    # Address       object
    # Latitude     float64
    # Longitude    float64
    # dtype: object
    #                                              Address  Latitude   Longitude
    # 0          Bandar Udara Sultan Aji Muhammad Sulaiman -1.265815  116.897726
    # 1                   Bandar Udara Halim Perdanakusuma -6.265209  106.886346
    # 2  Bandar Udara Internasional Sultan Mahmud Badar... -2.894799  104.704647
    # 3  Bandar Udara Internasional Zainuddin Abdul Madjid -8.761939  116.273557
    # 4  Bandar Udara Internasional Sultan Syarif Kasim II  4.645874  101.447722