Search code examples
pythonpandasdataframecomparedrop

Remove objects which has been repeated in two columns in dataframe


I have a data frame like this: enter image description here

and the dataset in the CSV file is here.

this data was extracted from the IMDb dataset. but I have a problem, I could not be able to remove the actor's names which are repeated in the same row for example in row number 4 I want to drop 'Marie Gruber' in both name and actors column. I tried to use to apply and all conditions but always code consider it the same. like this code:

data[data['name'] != data['actors']]

Solution

  • Trere are traling spaces for actors column, so first remove them by Series.str.strip:

    data['actors'] = data['actors'].str.strip()
    data[data['name'] != data['actors']]
    

    Or use skipinitialspace=True in read_csv:

    data = pd.read_csv(file, skipinitialspace=True)
    data[data['name'] != data['actors']]