Search code examples
pythonnlp

Python package to extract sentence from a textfile based on keyword


I need a python package that could get the related sentence from a text, based on the keywords provided.

For example, below is the Wikipedia page of J.J Oppenheimer -

Early life

Childhood and education
J. Robert Oppenheimer was born in New York City on April 22, 1904,[note 1][7] to Julius Oppenheimer, a wealthy Jewish textile importer who had immigrated to the United States from Germany in 1888, and Ella Friedman, a painter. 
Julius came to the United States with no money, no baccalaureate studies, and no knowledge of the English language. He got a job in a textile company and within a decade was an executive with the company. Ella was from Baltimore.[8] The Oppenheimer were non-observant Ashkenazi Jews.[9] 

The first atomic bomb was successfully detonated on July 16, 1945, in the Trinity test in New Mexico. 
Oppenheimer later remarked that it brought to mind words from the Bhagavad Gita: "Now I am become Death, the destroyer of worlds.

If my passed string is - "JJ Oppenheimer birth date", it should return "J. Robert Oppenheimer was born in New York City on April 22, 1904"

If my passed string is - "JJ Openheimer Trinity test", it should return "The first atomic bomb was successfully detonated on July 16, 1945, in the Trinity test in New Mexico"

I tried searching a lot but nothing comes closer to what I want and I don't know much about NLP vectorization techniques. It would be great if someone please suggest some package if they know(or exist).


Solution

  • You could use fuzzywuzzy.

    fuzz.ratio(search_text, sentence). 
    

    This gives you a score of how similar two strings are.

    https://github.com/seatgeek/fuzzywuzzy