Search code examples
pandasstringsplitreturnextract

Pandas: splitting a string and returning multiple elements does not work - only one element is returned


A dataframe I am working on contains an address line, from which I wish to extract the different variables (street name, number, postal code and country). The address line is filled according to the following template: "street number, postal code city, country".

Using .str.split(), I can get invividual elements from the string. Returning multiple elements however does not work.

So, extracting the country is easy enough. But when I try to get the street and number I run into some troubles. The street can contain one or more whitespaces, but after the last whitespace there is always the number. So I try to do this:

df["street_and_number"]=df["address"].str.split(",").str[0]

which creates a column with the street and the number. So far so good.

If I want to get the street, splitting on whitespaces and extracting everything but the last element seems (to me) a pretty straightforward option:

df["street"]=df["street_and_number"].str.split().str[:-1]

But this returns a list, see example below:

1541    [Burgemeester, Roelenweg]
Name: straat_en_nr, dtype: object

I would like to get a string, instead of a list with all the elements. Using the expand option doesn't produce the required result either:

df["street"]=df["street_and_number"].str.split(expand=True)[:-1]

Output:


    0   1   2

What am I doing wrong?


Solution

  • Add Series.str.join:

    df["street"] = df["street_and_number"].str.split().str[:-1].str.join(' ')