Search code examples
pythonpandasstringsplitseries

Split string and integer in Pandas series - Python


I have one column in a Pandas dataframe with "title of the movie" and "Year" (ex. "Toy Story (1995)") all in the same string. I have to split them in 2 different columns and of course the year must be an integer. I tried with this method (below) but the year remains a "object" type because it has parenthesis. Also, it doesn't work for one movie (there's still a title)...

split_movie = movies["Movie"].str.rsplit(" ", n = 1, expand=True)
movies["Movie Title"] = split_movie[0]
movies["Movie Year"] = split_movie[1]

I don't know if I can use the pd.year method or if I have to split the string in Python by creating a list...

Thanks for your help!


Solution

  • Keeping closer to your original code...

    Try:

    movies[['Title', 'Year']] = movies["Movie"].str.rsplit("(", n=1, expand=True)
    movies['Year'] = movies['Year'].str.replace(')', '', regex=False)
    movies['Year'] = movies['Year'].astype('int64')
    print(movies.info())
    

    Outputs:

     #   Column   Non-Null Count  Dtype 
    ---  ------   --------------  ----- 
     0   Movie    15 non-null     object 
     1   Title    15 non-null     object
     2   Year     15 non-null     int64