Search code examples
pythonpandasstringdataframesubstring

Breaking a python string in pandas dataframe


I have a column 'released' which has values like 'June 13, 1980 (United States)'

I want to get the year from this string so I tried using the following code

df['year_correct'] = df['released'].astype(str).str[',':'(']

But it is returning all the values as Nan in the new 'year_correct' column. Please help


Solution

  • A better way might be to extract the 4 digits value using words delimiter (\b) to ensure no more than 4 digits:

    df['year_correct'] = df['released'].astype(str).str.extract(r'\b(\d{4})\b')
    

    Example:

                            released year_correct
    0  June 13, 1980 (United States)         1980