Search code examples
pythonstringdataframestring-concatenation

python string concatenation following a sequence


What would be the most pythonic way of creating the following string concatenation:

We have an initial dataframe with some of the columns being:

  1. origin
  2. dest_1_country
  3. dest_1_city
  4. dest_2_country
  5. dest_2_city
  6. dest_3_country
  7. dest_3_city
  8. dest_4_country
  9. dest_4_city

We want to create an additional column that is the full route for every row in the dataframe and that could be generated by

df['full_route'] = df['origin].fillna("") + df['dest_1_country].fillna("") + df['dest_1_city].fillna("") + df['dest_2_country].fillna("") + df['dest_2_city].fillna("") + df['dest_3_country].fillna("") + df['dest_3_city].fillna("") + df['dest_4_country].fillna("") + df['dest_4_city].fillna("")

Obviously this cannot be the most pythonic way of getting the desired result given how cumbersome it is.. what if I had 100 cities in the df?

What would be the best way to achieving this in python?

Note: in the dataframe, there're other columns that have nothing to do with the route and that shouldn't be considered in the concatenation.

Thanks a lot!!


Solution

  • If you have this dataframe:

      origin dest_1_country dest_1_city dest_2_country dest_2_city
    0      a              b           c              d           e
    1      f              g           h              i           j
    

    Then you can do:

    df["full_route"] = df.sum(axis=1)  # df.fillna("").sum(axis=1) if you have NaNs
    print(df)
    

    To concatenate all columns:

      origin dest_1_country dest_1_city dest_2_country dest_2_city full_route
    0      a              b           c              d           e      abcde
    1      f              g           h              i           j      fghij
    

    EDIT: If you want to concatenate "origin" and every "*city"/"*country" column, you can do:

    df["full_route"] = df["origin"].fillna("") + df.filter(
        regex=r"country$|city$"
    ).fillna("").sum(axis=1)
    print(df)