I am trying to apply an expression to a DataFrame column which uses an ALL(String*)
function to find all Strings separated by comma and do some other mathematical operation such as mean()
.
So I have a dictionary with key = expression name
and value = expression
as such:
param_exp = {'AVG.P99' : '@df.mean(ALL(P99.*))', ...}
df_filtered.columns = ['P99.1', 'P99.2', 'P99.3',...]
Finally I want to return param_exp = {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3,...)), ...}
Here is what I have so far:
for key, value in param_exp.items():
list = []
string = ""
for column in df_filtered:
if "ALL" in str(value):
search_parm = param_exp[key].split("ALL(")[1]
search_parm = search_parm.split("*")[0]
# print(search_parm)
if search_parm in column.split(search_parm)[0]:
list.append(column)
string = ",".join(list)
string = param_exp[key].split("ALL(")[0] + str(string) + param_exp[key].split("*")[1]
string = string.strip("ALL").strip("*")
try:
param_exp[key] = string
except:
pass
print(param_exp)
This has not been working out and gives an index error:
string = param_exp[key].split("ALL(")[0] + str(string) + param_exp[key].split("*")[1]
IndexError: list index out of range
Just wondering if there is an easier way to do this. Thanks.
You may have string which doesn't have *
so split("*")
may get only one string but you use [1]
to get second string.
So you may need
if "ALL(" in value and "*)" in value:
May version with other changes:
list
is reserved word to create list so I use variable selected
value
instead of param_exp[key]
ALL(
and after *)
so I don't have to split it again to creat string
df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3']
param_exp = {'AVG.P99' : '@df.mean(ALL(P99.*))'}
for key, value in param_exp.items():
if "ALL(" in value and "*)" in value:
before, rest = value.split("ALL(")
search_parm, after = rest.split("*)")
print('search_parm:', search_parm)
selected = []
for column in df_filtered:
if search_parm in column:
selected.append(column)
print('selected:', selected)
string = ",".join(selected)
string = before + string + after
param_exp[key] = string
print('param_exp:', param_exp)
Result:
search_parm: P99.
selected: ['P99.1', 'P99.2', 'P99.3']
param_exp: {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3)'}
Maybe it could be better to use module fnmatch
which can use *
(any string, even empty) and ?
(exactly one char, not empty) to search matching strings and then you could use i.e P?9.*
or *.1
. It is used by module glob
to search filenames.
import fnmatch
df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3', 'P1.1']
param_exp = {
'AVG.P99' : '@df.mean(ALL(P99.*))',
'AVG.P??.1' : '@df.mean(ALL(P??.1))',
'AVG.P*.1' : '@df.mean(ALL(P*.1))',
}
for key, value in param_exp.items():
if "ALL(" in value and ")" in value:
before, rest = value.split("ALL(", 1)
search_parm, after = rest.split(")", 1)
print('search_parm:', search_parm)
selected = fnmatch.filter(df_filtered, search_parm)
print('selected:', selected)
string = ",".join(selected)
string = before + string + after
param_exp[key] = string
print('param_exp:', param_exp)
Result:
search_parm: P99.*
selected: ['P99.1', 'P99.2', 'P99.3']
search_parm: P??.1
selected: ['P99.1', 'P11.1']
search_parm: P*.1
selected: ['P99.1', 'P11.1', 'P1.1']
param_exp: {'AVG.P99': '@df.mean(P99.1,P99.2,P99.3)', 'AVG.P??.1': '@df.mean(P99.1,P11.1)', 'AVG.P*.1': '@df.mean(P99.1,P11.1,P1.1)'}
EDIT:
It can be even simpler if you use f-string
and put {ALL("P99.*"")}
- so it will try to run function ALL()
which you have to define.
But f-string
is executed automatically and it can't use string from user input or from file/database
import fnmatch
def ALL(pattern):
print('pattern:', pattern)
selected = fnmatch.filter(df_filtered, pattern)
print('selected:', selected)
return ",".join(selected)
df_filtered = ['P99.1', 'P99.2', 'P99.3', 'P11.1', 'P11.2', 'P11.3']
param_exp = {'AVG.P99' : f'@df.mean({ALL("P99.*")})'}
print('param_exp:', param_exp)