Search code examples
python-3.xpandastuplesgeopy

Map a new column to a Pandas dataframe by comparing current and previous row of tuples


I am trying to map a new column to a pandas dataframe using a custom function that takes in two input tuples. The function is:

def distance(origin, destination):
   lat1, lon1 = origin
   lat2, lon2 = destination
   radius = 3958.8 # miles

   dlat = math.radians(lat2-lat1)
   dlon = math.radians(lon2-lon1)
   a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
    * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
   c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
   d = radius * c

   return d

The dataframe has a column of lat and long in tuple form and I am attempting to measure the distance between the current and previous row's coordinates.

I have tried a for loop:

df3.loc[0, 'dist'] = 0
for i in range(1, len(df3)):
    df3.loc[i, 'dist'] = distance(df3.loc[i-1, 'lat_long'], df3.loc[i, 'lat_long'])

but I get an error "ValueError: not enough values to unpack (expected 2, got 1)"

Any ideas on how to do this better?


Solution

  • synthesised data to illustrate

    1. reset_index() to get row number as a column index
    2. construct a range() that is from previous row to current row. Floored previous row to 0 if it's first row
    3. pass list of tuples to tuplecalc(). You noted your long,lat are tuples
    4. calc that shows it is taking current and previous row into consideration
    5. finally remove synthetic index column
    df = pd.DataFrame({"geo":[(1,2),(3,4),(5,6)]}).reset_index()
    def distance(prev, curr):
        return prev[0] + prev[1] + curr[0] + curr[1]
    def tuplecalc(tuples):
        return distance(tuples[0], tuples[1] if len(tuples)==2 else (0,0))
    df["distance"] = df.apply(lambda r: tuplecalc(df.loc[range(max(r["index"]-1,0),r["index"]+1),"geo"].values), axis=1)
    df.drop(["index"], axis=1)
    

    as additional columns

    df = pd.DataFrame({"long":[1,3,5], "lat":[2,4,6]}).reset_index()
    def rowrange(index, col):
        return 0 if index==0 else df.loc[range(max(index-1,0),index), col].values[0]
    df["prev_long"] = df.apply(lambda r: rowrange(r["index"], "long"), axis=1)
    df["prev_lat"] = df.apply(lambda r: rowrange(r["index"], "lat"), axis=1)
    df
    

    output

        geo distance
    0   (1, 2)  3
    1   (3, 4)  10
    2   (5, 6)  18