I have a dataframe called data
, that looks like like this:
|...|category|...|ngram|...|
I need to slice this dataframe to instances where category
starts with the value of ngram
. So for example, if I had the following instance:
then that instance should be dropped from the resulting dataframe.
In T-SQL, I use the following query (which may not be the best way, but it works):
SELECT
*
FROM mytable
WHERE category NOT LIKE ngram+'%';
I have read up on this a bit, and my best attempt is:
data[data.category.str.startswith(data.ngram.str) == True]
But this does not return any rows, nor does the inverse (using == True
)
#use df.apply to filter the rows with category starts with ngram.
data[data.apply(lambda x: x.category.startswith(x.ngram), axis=1)]