I'm trying to Query data using python pandas library. here is an example json of the data...
[
{
"name": "Bob",
"city": "NY",
"status": "Active"
},
{
"name": "Jake",
"city": "SF",
"status": "Active"
},
{
"name": "Jill",
"city": "NY",
"status": "Lazy"
},
{
"name": "Steve",
"city": "NY",
"status": "Lazy"
}]
My goal is to query the data where city == NY and status == Lazy. One way using pandas DataFrame is to do...
df = df[(df.status == "Lazy") & (df.city == "NY")]
This is working fine but i wanted this to be more abstract.
This there way I can use **kwargs to filter the data? so far i've had trouble using Pandas documentation.
so far I've done.....
def main(**kwargs):
readJson = pd.read_json(sys.argv[1])
for key,value in kwargs.iteritems():
print(key,value)
readJson = readJson[readJson[key] == value]
print readJson
if __name__ == '__main__':
main(status="Lazy",city="NY")
again...this works just fine, but I wonder if there is some better way to do it.
I don't really see anything wrong with your approach. If you wanted to use df.query
you could do something like this, although I'd argue it's less readable.
expr = " and ".join(k + "=='" + v + "'" for (k,v) in kwargs.items())
readJson = readJson.query(expr)