Say I have a python class with a Pandas dataframe df
as attribute. I want to query df
by releasing one or more pre-defined queries, using a class function to which one or more query handles are provided as arguments:
import pandas as pd
import numpy as np
class doorn:
def __init__(self):
self.name = 'foo'
self.df = pd.DataFrame(data={'A':np.arange(0, 10), 'B':np.arange(5, 15), 'C':np.arange(14, 24)}, index=[x for x in range(0, 10)])
def query_df(self, *query):
# query arguments must by formatted as 'q1', 'q2' etc
queries = [q for q in query]
q1 = self.df.loc[self.df.A > 2].index
q2 = self.df.loc[self.df.B < 13].index
q3 = self.df.loc[self.df.C > 15].index
sel_rows = set().union(*[eval(x, globals(), locals()) for x in queries])
self.df = self.df.loc[sel_rows]
Now, it seems that eval
cannot find the instances of the query-strings it is provided:
>>> foo = doorn()
>>> foo.query_df('q1', 'q2')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 17, in query_df
File "<input>", line 17, in <listcomp>
File "<string>", line 1, in <module>
NameError: name 'q1' is not defined
My guess is that q1
, q2
, q3
are not present in the row comprehension Namespace. Or something, because I haven't really wrapped my head around Namespaces yet. I've tried solving this by providing globals()
and locals()
as additional arguments to eval
, as suggested in the docs, but without success.
How can I solve this? Can I even refrain from using eval
altogether?
I think this is because the locals()
in your comprehension loop are not the same as the ones in your function, thus they don't contain 'q1'. You may use global variables but I would not recommend this.
Moreover using eval with something coming maybe from user inputs can be hazardous has it can execute malicious code.
I suggest you to store your list of predefined queries in a dictionary like in this example:
class doorn:
def __init__(self):
self.name = 'foo'
self.df = pd.DataFrame(data={'A':np.arange(0, 10), 'B':np.arange(5, 15), 'C':np.arange(14, 24)}, index=[x for x in range(0, 10)])
def query_df(self, *query):
# query arguments must by formatted as 'q1', 'q2' etc
queries = [q for q in query]
possible_queries = {'q1' : self.df.loc[self.df.A > 2].index,
'q2' : self.df.loc[self.df.B < 13].index,
'q3' : self.df.loc[self.df.C > 15].index}
sel_rows = set().union(*[possible_queries[x] for x in queries])
self.df = self.df.loc[sel_rows]
Hope this will help you.