Search code examples
pythonpandasdataframezip-operator

Python: TypeError: zip argument #1 must support iteration


I'm getting an error when using the zip(*map(...)) call. Long explanation see below.

TypeError: zip argument #1 must support iteration

Here's what I got. A dataframe containing cities and their location in longitude and latitude. Now I want to calculate the distance between the cities using the harversine formular.

Starting point is this Pandas DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
                   {'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
                   {'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
df

Then I'm joining the dataframe with itself in order to get pairs of cities:

df['tmp'] = 1
df2 = pd.merge(df,df,on='tmp')
df2 = df2[df2.city_x != df2.city_y]

Which gives me this:

    city_x  lat_x       lng_x       tmp city_y  lat_y       lng_y
1   Berlin  52.52437    13.41053    1   Potsdam 52.39886    13.06566
2   Berlin  52.52437    13.41053    1   Hamburg 53.57532    10.01534
3   Potsdam 52.39886    13.06566    1   Berlin  52.52437    13.41053
5   Potsdam 52.39886    13.06566    1   Hamburg 53.57532    10.01534
6   Hamburg 53.57532    10.01534    1   Berlin  52.52437    13.41053
7   Hamburg 53.57532    10.01534    1   Potsdam 52.39886    13.06566

Now let's do the important part. The harversine formular is put into a function:

def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
    """
    Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes 
    based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
    """
    from math import radians, cos, sin, asin, sqrt
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles

    lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])

    # haversine formula 
    dlng = lng2 - lng1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
    c = 2 * asin(sqrt(a)) 
    distance = c * R
    return distance

This function should then be called on the joined dataframe:

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = zip(*map(haversine_distance, lng1, lat1, lng2, lat2))
    return dist

# now invoke the method in order to get a new column (series) back
get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])

Problem/Error: This gives me the following error:

TypeError: zip argument #1 must support iteration

Remark: What I don't get, is why I'm getting the error since this other method (see below) works perfectly fine. Basically the same thing!

def lat_lng_to_cartesian(lat: float, lng: float) -> float:
    from math import radians, cos, sin
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles

    lat_, lng_ = map(radians, [lat, lng])

    x = R * cos(lat_) * cos(lng_)
    y = R * cos(lat_) * sin(lng_)
    z = R * sin(lat_)
    return x, y, z

def get_cartesian_coordinates(lat: pd.Series, lng: pd.Series) -> (pd.Series, pd.Series, pd.Series):
    if lat is None or lng is None:
        return
    x, y, z = zip(*map(lat_lng_to_cartesian, lat, lng))
    return x, y, z

get_cartesian_coordinates(df2['lat_x'], df2['lng_x'])

Solution

  • As I mentioned in the comments, to be able to use the haversine_distance in the current way you've defined it, you are going to need to zip those columns first before mapping. In essence, you will need to edit the get_haversine_distance function to make sure that it is zipping the corresponding rows into tuples before unpacking each tuple into arguments for the haversine_distance function. The following is an illustration, using the provided data:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
                       {'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
                       {'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
    df
    
    #       city       lat       lng  tmp
    # 0   Berlin  52.52437  13.41053    1
    # 1  Potsdam  52.39886  13.06566    1
    # 2  Hamburg  53.57532  10.01534    1
    
    # Make sure to reset the index after you filter out the unneeded rows
    df['tmp'] = 1
    df2 = pd.merge(df,df,on='tmp')
    df2 = df2[df2.city_x != df2.city_y].reset_index(drop=True)
    
    #     city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y
    # 0   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566
    # 1   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534
    # 2  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053
    # 3  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534
    # 4  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053
    # 5  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566
    
    def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
        dist = pd.Series(map(lambda x: haversine_distance(*x), zip(lng1, lat1, lng2, lat2)))
        return dist
    
    
    def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
        """
        Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes 
        based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
        """
        from math import radians, cos, sin, asin, sqrt
        R = 6371 # Radius of earth in kilometers. Use 3956 for miles
        lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])
        # haversine formula
        dlng = lng2 - lng1
        dlat = lat2 - lat1
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
        c = 2 * asin(sqrt(a))
        distance = c * R
        return distance
    
    
    df2['distance'] = get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])
    
    #     city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y    distance
    # 0   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566   27.215704
    # 1   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534  255.223782
    # 2  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053   27.215704
    # 3  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534  242.464120
    # 4  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053  255.223782
    # 5  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566  242.464120
    

    Let me know if this is what you expect the output to look like.