I'm using re to take the questions from a text. I just want the sentence with the question, but it's taking multiple sentences before the question as well. My code looks like this:
match = re.findall("[A-Z].*\?", data2)
print(match)
an example of a result I get is:
'He knows me, and I know him. Do YOU know me? Hey?'
the two questions should be separated and the non question sentence shouldn't be there. Thanks for any help.
The .
character in regex matches any text, including periods, which you don't want to include. Why not simply match anything besides the sentence ending punctuation?
questions = re.findall(r"\s*([^\.\?]+\?)", data2)
# \s* sentence beginning space to ignore
# ( start capture group
# [^\.\?]+ negated capture group matching anything besides "." and "?" (one or more)
# \? question mark to end sentence
# ) end capture group