In the data frame that contains 20000 chess matches, there is a column named opening_name
.
All values in this column are strings like this:
1 Nimzowitsch Defense: Kennedy Variation
2 King's Pawn Game: Leonardis Variation
3 Queen's Pawn Game: Zukertort Variation
4 Philidor Defense
...
20053 Dutch Defense
20054 Queen's Pawn
20055 Queen's Pawn Game: Mason Attack
20056 Pirc Defense
20057 Queen's Pawn Game: Mason Attack
In this column, there are almost a hundred values that have similar names, similar names like Sicilian defence
and Sicilian defence: dragon variation
I want to access all these values that start with the string Sicilian defence
. How can I do that?
As per my understanding, the string part before : is the starting string and the value after it is the variation.
Step 1 : Extract the prefix value for all the rows by splitting on :
df['base_term']=df.opening_name.apply(lambda x: x.split(":")[0])
This shall return a new column with values or the start string.
Step 2 : Retrieve the rows that have a given starting string , for example "Sicilian defence".
res = df.loc[df.base_term=='Sicilian defence']
res is the dataframe with rows that have opening_name beginning with 'Sicilian defence'