Search code examples
pythonregexcase-insensitiveprefix

Python regex, searching for prefixes inside a target string


I need to find a list of prefixes of words inside a target string (I would like to have the list of matching indexes in the target string handled as an array).

  • I think using regex should be the cleanest way.
  • Given that I am looking for the pattern "foo", I would like to retrieve in the target string words like "foo", "Foo", "fooing", "Fooing"
  • Given that I am looking for the pattern "foo bar", I would like to retrieve in the target string patterns like "foo bar", "Foo bar", "foo Bar", "foo baring" (they are still all handled as prefixes, am I right?)

At the moment, after running it in different scenarios, my Python code still does not work.

  • I am assuming I have to use ^ to match the beginning of a word in a target string (i.e. a prefix).
  • I am assuming I have to use something like ^[fF] to be case insensitive with the first letter of my prefix.
  • I am assuming I should use something like ".*" to let the regexp behave like a prefix.
  • I am assuming I should use the \prefix1|prefix2|prefix3** to put in **logic OR many different prefixes in the pattern to search.

The following source code does not work because I am wrongly setting the txt_pattern.

import re

#              '            '           '            '                     '             '           '
txt_str = "edb foooooo jkds Fooooooo kj fooing jdcnj Fooing ujndn ggng sxk foo baring sh foo Bar djw Foo";
txt_pattern = ''#???

out_obj = re.match(txt_pattern,txt_str)
if out_obj:
   print "match!"
else:
   print "No match!"
  1. What am I missing?

  2. How should I set the txt_pattern?

  3. Can you please suggest me a good tutorial with minimum working examples? At the moment the standard tutorials from the first page of a Google search are very long and detailed, and not so simple to understand.

Thanks!


Solution

  • Regex is the wrong approach. First parse your string into a list of strings with one word per item. Then use a list comprehension with a filter. The split method on strings is a good way to get the list of words, then you can simply do [item for item in wordlist if item.startswith("foo")]

    People spend ages hacking up inefficient code using convoluted regexes when all they need is a few string methods like split, partition, startswith and some pythonic list comprehensions or generators.

    Regexes have their uses but simple string parsing is not one of them.