A dataframe I am working on contains an address line, from which I wish to extract the different variables (street name, number, postal code and country). The address line is filled according to the following template: "street number, postal code city, country".
Using .str.split(), I can get invividual elements from the string. Returning multiple elements however does not work.
So, extracting the country is easy enough. But when I try to get the street and number I run into some troubles. The street can contain one or more whitespaces, but after the last whitespace there is always the number. So I try to do this:
df["street_and_number"]=df["address"].str.split(",").str[0]
which creates a column with the street and the number. So far so good.
If I want to get the street, splitting on whitespaces and extracting everything but the last element seems (to me) a pretty straightforward option:
df["street"]=df["street_and_number"].str.split().str[:-1]
But this returns a list, see example below:
1541 [Burgemeester, Roelenweg]
Name: straat_en_nr, dtype: object
I would like to get a string, instead of a list with all the elements. Using the expand option doesn't produce the required result either:
df["street"]=df["street_and_number"].str.split(expand=True)[:-1]
Output:
0 1 2
What am I doing wrong?
Add Series.str.join
:
df["street"] = df["street_and_number"].str.split().str[:-1].str.join(' ')