Search code examples
pythonpython-3.xpandasgisproj

Converting between projections using pyproj in Pandas dataframe


This is undoubtedly a bit of a "can't see the wood for the trees" moment. I've been staring at this code for an hour and can't see what I've done wrong. I know it's staring me in the face but I just can't see it!

I'm trying to convert between two geographical co-ordinate systems using Python.

I have longitude (x-axis) and latitude (y-axis) values and want to convert to OSGB 1936. For a single point, I can do the following:

import numpy as np
import pandas as pd
import shapefile
import pyproj

inProj = pyproj.Proj(init='epsg:4326')
outProj = pyproj.Proj(init='epsg:27700')

x1,y1 = (-2.772048, 53.364265)

x2,y2 = pyproj.transform(inProj,outProj,x1,y1)

print(x1,y1)
print(x2,y2)

This produces the following:

-2.772048 53.364265
348721.01039783185 385543.95241055806

Which seems reasonable and suggests that longitude of -2.772048 is converted to a co-ordinate of 348721.0103978.

In fact, I want to do this in a Pandas dataframe. The dataframe contains columns containing longitude and latitude and I want to add two additional columns that contain the converted co-ordinates (called newLong and newLat).

An exemplar dataframe might be:

    latitude  longitude
0  53.364265  -2.772048
1  53.632481  -2.816242
2  53.644596  -2.970592

And the code I've written is:

import numpy as np
import pandas as pd
import shapefile
import pyproj

inProj = pyproj.Proj(init='epsg:4326')
outProj = pyproj.Proj(init='epsg:27700')

df = pd.DataFrame({'longitude':[-2.772048,-2.816242,-2.970592],'latitude':[53.364265,53.632481,53.644596]})

def convertCoords(row):
    x2,y2 = pyproj.transform(inProj,outProj,row['longitude'],row['latitude'])
    return pd.Series({'newLong':x2,'newLat':y2})

df[['newLong','newLat']] = df.apply(convertCoords,axis=1)

print(df)

Which produces:

    latitude  longitude        newLong         newLat
0  53.364265  -2.772048  385543.952411  348721.010398
1  53.632481  -2.816242  415416.003113  346121.990302
2  53.644596  -2.970592  416892.024217  335933.971216

But now it seems that the newLong and newLat values have been mixed up (compared with the results of the single point conversion shown above).

Where have I got my wires crossed to produce this result? (I apologise if it's completely obvious!)


Solution

  • When you do df[['newLong','newLat']] = df.apply(convertCoords,axis=1), you are indexing the columns of the df.apply output. However, the column order is arbitrary because your series was defined using a dictionary (which is inherently unordered).

    You can opt to return a Series with a fixed column ordering:

    return pd.Series([x2, y2])
    

    Alternatively, if you want to keep the convertCoords output labelled, then you can use .join to combine results instead:

    return pd.Series({'newLong':x2,'newLat':y2})
    ...
    df = df.join(df.apply(convertCoords, axis=1))