Search code examples
pythonpandassql-likestring-matching

Slice a dataframe based on one column starting with the value of another column


I have a dataframe called data, that looks like like this:

|...|category|...|ngram|...|

I need to slice this dataframe to instances where category starts with the value of ngram. So for example, if I had the following instance:

  • category: beds
  • ngram: bed

then that instance should be dropped from the resulting dataframe.

In T-SQL, I use the following query (which may not be the best way, but it works):

SELECT
   *
FROM   mytable
WHERE  category NOT LIKE ngram+'%';

I have read up on this a bit, and my best attempt is:

data[data.category.str.startswith(data.ngram.str) == True]

But this does not return any rows, nor does the inverse (using == True)


Solution

  • #use df.apply to filter the rows with category starts with ngram.
    data[data.apply(lambda x: x.category.startswith(x.ngram), axis=1)]