python string substring matching metrics

Python - Get matched string percentage along with the string

I want to match a string to certain keywords and get the percentage and the substring that was matched to my keyword. E.g. I have a list of keywords

keywords = ['Projekt-Nr.:', 'Projektbezeichnung:', 'Anlagenklassifizierung:', 'Arbeiten / Gewerk:']

and some unknown text e.g.

s = "Projekthezeichnung: —_[H- Kloster Eig i Krankenhaus"

I want my keywords to be searched in this string so that it returns me the partially matched string.

'Projektbezeichnung:' should match 'Projekthezeichnung:' with over 95% accuracy (I am already using cdifflib for that) but cdifflib doesn't return the substring my keyword was matched with.

How can I get the unknown substring that my keyword was partially matched with?

Any help would be quite useful, thanks!

Solution

difflib's get_close_matches seems suitable:

from difflib import get_close_matches as gcm

keywords = ['Projekt-Nr.:', 'Projektbezeichnung:', 'Anlagenklassifizierung:', 'Arbeiten / Gewerk:']
unk_text = "Projekthezeichnung: —_[H- Kloster Eig i Krankenhaus"
words = unk_text.split()

result = [gcm(kw, words, n=len(words), cutoff=0.8) for kw in keywords]
# [[], ['Projekthezeichnung:'], [], []]

Each sublist of the result list contains "close" matches to the corresponding keyword.