Search code examples
pythonpandasutm

pandas - apply UTM function to dataframe columns


I'm working with this python package called UTM, which converts WGS84 coordinates to UTM and vice versa. I would like to apply this function to a pandas dataframe. The function works as follows:

utm.from_latlon(51.2, 7.5)
>>> (395201.3103811303, 5673135.241182375, 32, 'U')

where the input is a couple of coordinates, and it returns a tuple of the same coordinates in UTM system. For my purposes I'm interested only in the first two elements of the tuple.

I'm working on a Dataframe called cities like:

City;Latitude;Longitude;minx;maxx;miny;maxy
Roma;41.892916;12.48252;11.27447419;13.69056581;40.99359439;42.79223761
Paris;48.856614;2.352222;0.985506011;3.718937989;47.95729239;49.75593561
Barcelona;41.385064;2.173403;0.974836927;3.371969073;40.48574239;42.28438561
Berlin;52.519171;13.406091;11.92835553;14.88382647;51.61984939;53.41849261
Moscow;55.755826;37.6173;36.01941671;39.21518329;54.85650439;56.65514761

and I would like to add four columns for each row called 'utmminx','utmmax','utmminy','utmmaxy' as a result of applying the utm function to the 'minx','maxx','miny','maxy' columns. So far I tried the following, assigning the first and the second value of the resulting tuple to the new columns:

cities['utmminx'],cities['utmmaxx'] = utm.from_latlon(cities['minx'],cities['maxx'])[0],utm.from_latlon(cities['minx'],cities['maxx'])[1]

but I received a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I tried to set only the first row value to the function and it works:

utm.from_latlon(cities['minx'][0],cities['maxx'][0])[0],utm.from_latlon(cities['minx'][0],cities['maxx'][0])[1]
>>> (357074.7837193568, 1246647.7959235134)

I would like to avoid classical loops over the dataframe as I thought there is a classical pandas method to do this.


Solution

  • Starting with your frame

            City   Latitude  Longitude       minx       maxx       miny       maxy
    0       Roma  41.892916  12.482520  11.274474  13.690566  40.993594  42.792238
    1      Paris  48.856614   2.352222   0.985506   3.718938  47.957292  49.755936
    2  Barcelona  41.385064   2.173403   0.974837   3.371969  40.485742  42.284386
    3     Berlin  52.519171  13.406091  11.928356  14.883826  51.619849  53.418493
    4     Moscow  55.755826  37.617300  36.019417  39.215183  54.856504  56.655148
    

    We define a function that takes a row, calls utm.from_latlon() and returns a Series of the first two elements of the tuple that we get from utm. Then we use Pandas' apply() to call that function. I just did one set of coordinates, but you can do the same apply() statement for the others.

    EDIT I changed the function to index by position instead of name to make the function reusable

    def getUTMs(row):
        tup = utm.from_latlon(row.ix[0],row.ix[1])
        return pd.Series(tup[:2])
    
    cities[['utmminy','utmminx']] = cities[['miny','maxx']].apply(getUTMs , axis=1)
    cities
    
           City   Latitude  Longitude       minx       maxx       miny  \
    0       Roma  41.892916  12.482520  11.274474  13.690566  40.993594   
    1      Paris  48.856614   2.352222   0.985506   3.718938  47.957292   
    2  Barcelona  41.385064   2.173403   0.974837   3.371969  40.485742   
    3     Berlin  52.519171  13.406091  11.928356  14.883826  51.619849   
    4     Moscow  55.755826  37.617300  36.019417  39.215183  54.856504   
    
            maxy        utmminy         utmminx  
    0  42.792238  389862.562124  4538871.624816  
    1  49.755936  553673.645924  5311803.556837  
    2  42.284386  531525.080929  4481738.581782  
    3  53.418493  491957.246518  5718764.545758  
    4  56.655148  513814.029424  6078844.774914