Is there an ALL(String, List) function to find and return all values in the list containing the String in Python?

I am trying to apply an expression to a DataFrame column which uses an ALL(String*) function to find all Strings separated by comma and do some other mathematical operation such as mean().

So I have a dictionary with key = expression name and value = expression as such:

param_exp = {'AVG.P99' : '@df.mean(ALL(P99.*))', ...}

df_filtered.columns = ['P99.1', 'P99.2', 'P99.3',...]

Finally I want to return param_exp = {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3,...)), ...}

Here is what I have so far:

for key, value in param_exp.items():
    list = []
    string = ""
    for column in df_filtered:
        if "ALL" in str(value):
            search_parm = param_exp[key].split("ALL(")[1]
            search_parm = search_parm.split("*")[0]
            # print(search_parm)
            if search_parm in column.split(search_parm)[0]:
                list.append(column)
    string = ",".join(list)
    string = param_exp[key].split("ALL(")[0] + str(string) + param_exp[key].split("*")[1]
    string = string.strip("ALL").strip("*")
    try:
        param_exp[key] = string
    except:
        pass
print(param_exp)

This has not been working out and gives an index error:

    string = param_exp[key].split("ALL(")[0] + str(string) + param_exp[key].split("*")[1]
IndexError: list index out of range

Just wondering if there is an easier way to do this. Thanks.

Solution

You may have string which doesn't have * so split("*") may get only one string but you use [1] to get second string.

So you may need

if "ALL(" in value and "*)" in value:

May version with other changes:

list is reserved word to create list so I use variable selected
I use value instead of param_exp[key]
during split I keep value which is before ALL( and after *) so I don't have to split it again to creat string

df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3']
param_exp = {'AVG.P99' : '@df.mean(ALL(P99.*))'}

for key, value in param_exp.items():

    if "ALL(" in value and "*)" in value:
        before, rest = value.split("ALL(")
        search_parm, after = rest.split("*)")
    
        print('search_parm:', search_parm)

        selected = []
        for column in df_filtered:
            if search_parm in column:
                selected.append(column)

        print('selected:', selected)
        
        string = ",".join(selected)
        string = before + string + after
        param_exp[key] = string
    
print('param_exp:', param_exp)

Result:

search_parm: P99.
selected: ['P99.1', 'P99.2', 'P99.3']
param_exp: {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3)'}

Maybe it could be better to use module fnmatch which can use * (any string, even empty) and ? (exactly one char, not empty) to search matching strings and then you could use i.e P?9.* or *.1. It is used by module glob to search filenames.

import fnmatch

df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3', 'P1.1']
param_exp = {
    'AVG.P99' : '@df.mean(ALL(P99.*))',
    'AVG.P??.1' : '@df.mean(ALL(P??.1))',
    'AVG.P*.1' : '@df.mean(ALL(P*.1))',
}

for key, value in param_exp.items():

    if "ALL(" in value and ")" in value:
        before, rest = value.split("ALL(", 1)
        search_parm, after = rest.split(")", 1)
    
        print('search_parm:', search_parm)

        selected = fnmatch.filter(df_filtered, search_parm)
        
        print('selected:', selected)
        
        string = ",".join(selected)
        string = before + string + after
        param_exp[key] = string
    
print('param_exp:', param_exp)

Result:

search_parm: P99.*
selected: ['P99.1', 'P99.2', 'P99.3']

search_parm: P??.1
selected: ['P99.1', 'P11.1']

search_parm: P*.1
selected: ['P99.1', 'P11.1', 'P1.1']

param_exp: {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3)', 'AVG.P??.1': '@df.mean(P99.1,P11.1)', 'AVG.P*.1': '@df.mean(P99.1,P11.1,P1.1)'}

EDIT:

It can be even simpler if you use f-string and put {ALL("P99.*"")} - so it will try to run function ALL() which you have to define.

But f-string is executed automatically and it can't use string from user input or from file/database

import fnmatch

def ALL(pattern):
    print('pattern:', pattern)
    selected = fnmatch.filter(df_filtered, pattern)
    print('selected:', selected)
    return ",".join(selected)
        
df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3']
param_exp = {'AVG.P99' : f'@df.mean({ALL("P99.*")})'}
    
print('param_exp:', param_exp)