Search code examples
pythonapache-sparkpysparkrdd

RDD filter with exact word match search


I have an rdd object(created from a text file) and I am creating another rdd object by filtering with exact matching word.

rdd2 = rdd1.filter(lambda x: word in x)

word is a string generated in a for loop. So I will be searching for some words in rdd1 in a loop. For example, if my word value is 'ebook'. So, when I am searching the rdd1, I am getting all the lines matching ebook. But, I am also getting lines with value 'ebooks'.

How to filter an rdd with exact word match? rdd2 should contain lines with only exact matching word, which is ebook not ebooks.

I need to create an intermediate rdd for further processes. Please help.


Solution

  • rdd2 = rdd1.filter(lambda x: word in x.split())
    

    x.split() worked for the exact word match.