Search code examples
pythonstringconditional-statementslogarithm

Python: string comparison with double conditions


Trying to search 2 lists for common strings. 1-st list being a file with text, while the 2-nd is a list of words with logarithmic probability before the actual word – to match, a word not only needs to be in both lists, but also have a certain minimal log probability (for instance, between -2,123456 and 0,000000; that is negative 2 increasing up to 0). The tab separated list can look like:

-0.962890   dog
-1.152454   lol
-2.050454   cat


I got stuck doing something like this:

common = []
for i in list1:
    if i in list2 and re.search("\-[0-1]\.[\d]+", list2):
        common.append(i)


The idea to simply preprocess the list to remove lines under a certain threshold is valid of course, but since both the word and its probability are on the same line, isn’t a condition also possible? (Regexps aren’t necessary, but for comparison solutions both with and without them would be interesting.)

EDIT: own answer to this question below.


Solution

  • Answering my own question after hours of trial and error, and read tips from here and there. Turns out, i was thinking in the right direction from start, but needed to separate word detection and pattern matching, and instead combine the latter with log probability checking. Thus creating a temporary list of items with needed log prob, and then just comparing that against the text file.

        common = []
        prob = []
        loga , rithmus =   -9.87   ,   -0.01
    
        for i in re.findall("\-\d\.\d+", list2):
            if (loga < float(i.split()[0]) < rithmus):
                prob.append(i)
    
        prob = "\n".join(prob)
        for i in list1:
            if i in prob:
                common.append(i)