Search code examples
regexpython-3.xnon-ascii-characters

regex - how to match parts of an element in a list to patterns in a txt file(processing chinese characters)


I am new to python and regex, and I am trying to match part of an element from a list to patterns in a txt file.

Below is an example: name_list = ["林俊杰","林宥嘉","周杰伦","宋祖英"] pattern = [“杰伦","俊杰"] What I am trying to do is to loop through each element in the name_list and compare if any part of the element matches with patterns in the pattern list. For example, in name_list[0],"俊杰" matches with the second pattern in the pattern list.

new_list= [] whenever a match occurs, I want to append the match to a new list in the same order as the elements in name_list. For example, I want "俊杰" to be the first element in the new_list.

Also, I need to import pattern from a txt file. I have no idea how to do that either. Can anyone help me with this please?


Solution

  • You can do it without using regex as long as the pattern list only contains literal strings, if it isn't always the case, you only have to change the condition if p in s to if re.search(p, s):

    def getMatchPattern(patterns, s):
        for p in patterns:
            if p in s:
                return p
        return ''
    
    name_list = ["林俊杰","林宥嘉","周杰伦","宋祖英"]
    pattern_list = ["杰伦","俊杰"]
    
    result = [getMatchPattern(pattern_list, x) for x in name_list]
    

    Note that if a string contains several strings from your pattern list, the first string in the list wins (and not the first in the string). If you want to change this behaviour, you can remove the return from the loop, and compare the substrings indexes between the current successful "pattern" and the previous one.

    About how to read a file, a basic tutorial or/and a little search will help you.