What would be the most pythonic way of creating the following string concatenation:
We have an initial dataframe with some of the columns being:
We want to create an additional column that is the full route for every row in the dataframe and that could be generated by
df['full_route'] = df['origin].fillna("") + df['dest_1_country].fillna("") + df['dest_1_city].fillna("") + df['dest_2_country].fillna("") + df['dest_2_city].fillna("") + df['dest_3_country].fillna("") + df['dest_3_city].fillna("") + df['dest_4_country].fillna("") + df['dest_4_city].fillna("")
Obviously this cannot be the most pythonic way of getting the desired result given how cumbersome it is.. what if I had 100 cities in the df?
What would be the best way to achieving this in python?
Note: in the dataframe, there're other columns that have nothing to do with the route and that shouldn't be considered in the concatenation.
Thanks a lot!!
If you have this dataframe:
origin dest_1_country dest_1_city dest_2_country dest_2_city
0 a b c d e
1 f g h i j
Then you can do:
df["full_route"] = df.sum(axis=1) # df.fillna("").sum(axis=1) if you have NaNs
print(df)
To concatenate all columns:
origin dest_1_country dest_1_city dest_2_country dest_2_city full_route
0 a b c d e abcde
1 f g h i j fghij
EDIT: If you want to concatenate "origin" and every "*city"/"*country" column, you can do:
df["full_route"] = df["origin"].fillna("") + df.filter(
regex=r"country$|city$"
).fillna("").sum(axis=1)
print(df)