I have a data that looks like this :-
data = {"doc1" : {'a': 2 , 'b': 1,'c':3}, "doc2" : {'a': 1 , 'b': 1,'c':3}, "doc3" : {'a': 1 , 'b': 1,'c':3}}
I convert it into a dataframe :-
df = pd.DataFrame.from_dict(data,orient='index')
Dataframe looks like this :-
a c b
doc1 2 3 1
doc2 1 3 1
doc3 1 3 1
Now I want to sum all the values in column b where column a values is 1.
So the value I want will be 2.
Is there an easy way to do this rather than iterating through both the columns ? I checked other posts and found this :-
This makes use of .loc function.
df.loc[df['a'] == 1, 'b'].sum()
But for some reason, I can't seem to make it to work with my dataframe.
Please let me know.
Thanks.
You are very close. See below.
>>> df[df['a'] == 1]['b'].sum()
2
Instead of using .loc
, try just filtering the dataframe first (df[df['a'] == 1]
), then selecting the column 'b'
, and then summing.
Edit: I'll leave this here for future reference, although depending on the version of pandas you're using, your solution should work (thanks, @maxymoo). I'm running 0.18.1
and both approaches worked.