Search code examples
pythonpandasstringcategorical-data

searching a string value in a non-numeric column


In the data frame that contains 20000 chess matches, there is a column named opening_name.

All values in this column are strings like this:

1        Nimzowitsch Defense: Kennedy Variation
2         King's Pawn Game: Leonardis Variation
3        Queen's Pawn Game: Zukertort Variation
4                              Philidor Defense
                          ...                  
20053                             Dutch Defense
20054                              Queen's Pawn
20055           Queen's Pawn Game: Mason Attack
20056                              Pirc Defense
20057           Queen's Pawn Game: Mason Attack

In this column, there are almost a hundred values that have similar names, similar names like Sicilian defence and Sicilian defence: dragon variation I want to access all these values that start with the string Sicilian defence. How can I do that?


Solution

  • As per my understanding, the string part before : is the starting string and the value after it is the variation.

    Step 1 : Extract the prefix value for all the rows by splitting on :

    df['base_term']=df.opening_name.apply(lambda x: x.split(":")[0])
    

    This shall return a new column with values or the start string.

    Step 2 : Retrieve the rows that have a given starting string , for example "Sicilian defence".

    res = df.loc[df.base_term=='Sicilian defence']
    

    res is the dataframe with rows that have opening_name beginning with 'Sicilian defence'