Search code examples
pythonmerging-data

Merging datasets in Python: 1st data set with city name and 2nd with two different columns for city_id for origin and destination cities to match with


I need some help in understanding merging 2 data sets in one in Python.

Their layout is as follows,

  1. First has columns: city_id, city_name
  2. Second has: origincity_id, lat, long, destinationcity_id, lat, long

I wanted to understand as to how I could merge these two data sets using python or pandas to match the city_name in the first data set to create two columns to match the origin city id and destination city id?

Thank you.


Solution

  • Use map.

    Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.

    Here we provide another Series from our city lookup dataframe.

    >>> import pandas as pd
    >>> lookup_df = pd.DataFrame([{"id":1,"city":"NY"},{"id":2,"city":"TX"}])
    >>> 
    >>> lookup_df
      city  id
    0   NY   1
    1   TX   2
    >>> data_df = pd.DataFrame([{"id1":2,"id2":1},{"id1":1,"id2":2}])
    >>> data_df
       id1  id2
    0    2    1
    1    1    2
    >>> data_df["id1_city"] = data_df["id1"].map(lookup_df.set_index("id")["city"])
    >>> data_df["id2_city"] = data_df["id2"].map(lookup_df.set_index("id")["city"])
    >>> 
    >>> data_df
       id1  id2 id1_city id2_city
    0    2    1       TX       NY
    1    1    2       NY       TX