Search code examples
pythondataframematheval

Is there an ALL(String, List) function to find and return all values in the list containing the String in Python?


I am trying to apply an expression to a DataFrame column which uses an ALL(String*) function to find all Strings separated by comma and do some other mathematical operation such as mean().

So I have a dictionary with key = expression name and value = expression as such:

param_exp = {'AVG.P99' : '@df.mean(ALL(P99.*))', ...}

df_filtered.columns = ['P99.1', 'P99.2', 'P99.3',...]

Finally I want to return param_exp = {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3,...)), ...}

Here is what I have so far:

for key, value in param_exp.items():
    list = []
    string = ""
    for column in df_filtered:
        if "ALL" in str(value):
            search_parm = param_exp[key].split("ALL(")[1]
            search_parm = search_parm.split("*")[0]
            # print(search_parm)
            if search_parm in column.split(search_parm)[0]:
                list.append(column)
    string = ",".join(list)
    string = param_exp[key].split("ALL(")[0] + str(string) + param_exp[key].split("*")[1]
    string = string.strip("ALL").strip("*")
    try:
        param_exp[key] = string
    except:
        pass
print(param_exp)

This has not been working out and gives an index error:

    string = param_exp[key].split("ALL(")[0] + str(string) + param_exp[key].split("*")[1]
IndexError: list index out of range

Just wondering if there is an easier way to do this. Thanks.


Solution

  • You may have string which doesn't have * so split("*") may get only one string but you use [1] to get second string.

    So you may need

    if "ALL(" in value and "*)" in value:
    

    May version with other changes:

    • list is reserved word to create list so I use variable selected
    • I use value instead of param_exp[key]
    • during split I keep value which is before ALL( and after *) so I don't have to split it again to creat string
    df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3']
    param_exp = {'AVG.P99' : '@df.mean(ALL(P99.*))'}
    
    for key, value in param_exp.items():
    
        if "ALL(" in value and "*)" in value:
            before, rest = value.split("ALL(")
            search_parm, after = rest.split("*)")
        
            print('search_parm:', search_parm)
    
            selected = []
            for column in df_filtered:
                if search_parm in column:
                    selected.append(column)
    
            print('selected:', selected)
            
            string = ",".join(selected)
            string = before + string + after
            param_exp[key] = string
        
    print('param_exp:', param_exp)
    

    Result:

    search_parm: P99.
    selected: ['P99.1', 'P99.2', 'P99.3']
    param_exp: {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3)'}
    

    Maybe it could be better to use module fnmatch which can use * (any string, even empty) and ? (exactly one char, not empty) to search matching strings and then you could use i.e P?9.* or *.1. It is used by module glob to search filenames.

    import fnmatch
    
    df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3', 'P1.1']
    param_exp = {
        'AVG.P99' : '@df.mean(ALL(P99.*))',
        'AVG.P??.1' : '@df.mean(ALL(P??.1))',
        'AVG.P*.1' : '@df.mean(ALL(P*.1))',
    }
    
    for key, value in param_exp.items():
    
        if "ALL(" in value and ")" in value:
            before, rest = value.split("ALL(", 1)
            search_parm, after = rest.split(")", 1)
        
            print('search_parm:', search_parm)
    
            selected = fnmatch.filter(df_filtered, search_parm)
            
            print('selected:', selected)
            
            string = ",".join(selected)
            string = before + string + after
            param_exp[key] = string
        
    print('param_exp:', param_exp)
    

    Result:

    search_parm: P99.*
    selected: ['P99.1', 'P99.2', 'P99.3']
    
    search_parm: P??.1
    selected: ['P99.1', 'P11.1']
    
    search_parm: P*.1
    selected: ['P99.1', 'P11.1', 'P1.1']
    
    param_exp: {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3)', 'AVG.P??.1': '@df.mean(P99.1,P11.1)', 'AVG.P*.1': '@df.mean(P99.1,P11.1,P1.1)'}
    

    EDIT:

    It can be even simpler if you use f-string and put {ALL("P99.*"")} - so it will try to run function ALL() which you have to define.

    But f-string is executed automatically and it can't use string from user input or from file/database

    import fnmatch
    
    def ALL(pattern):
        print('pattern:', pattern)
        selected = fnmatch.filter(df_filtered, pattern)
        print('selected:', selected)
        return ",".join(selected)
            
    df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3']
    param_exp = {'AVG.P99' : f'@df.mean({ALL("P99.*")})'}
        
    print('param_exp:', param_exp)