We have just learnt how to filter through some pandas in Python so I thought I would try this out on a public data set. (http://data.wa.aemo.com.au/#stem-bids-and-offers)
I used August's data for this.
The challenge I set myself was to filter ONLY on $/MWh > 0 AND it had to be on bids. We have learnt how to use np.logical_and to filter but the problem I found is that I can filter on EITHER numerical OR Logical. Not both.
I have a approach which works and gets me the data and visualisation I'm after but I'm certain there is a much more efficient way of filtering by text and numeric fields. The problem with my approach is that it only works if the character size is different. i.e. If it said Bid or Fib. I would pick up both. I only want to pick up bid. Could anyone please point me in the right direction?
Here is my code:
#Task: I want to filter out ONLY positive $/MWh bids
#This requires 2 filters - 1 to filter out the $MWh > 0 and 1 to filter by Bids
# Try converting this to a numpy array and using the filtering mechanisms there
import numpy as np
df = pd.read_csv('stem-bids-and-offers-2017-08.csv')
df.head(5)
#I don't know how to filter by 'text' just yet so I will have to use another way which is using the len function
#This will reduce the bid/offer field to characters
df['boLength'] = df['Bid or Offer'].apply(len)
df.head(5)
filtByPriceBid = np.logical_and(df['Price ($/MWh)'] > 0, df['boLength'] == 3)
filtByPriceBid.head(5)
df2 = df[filtByPriceBid]
df2.head(10)
sns.kdeplot(df2['Price ($/MWh)'], shade=True)
PS: I attached the KDE Plot which came out of this. If anyone wants to provide interpretation on this as well, please feel free to do so! I was expecting a normalised distribution but unfortunately, this is not the case.
I hope this is what you are looking for.
You could use &
to have multiple filters together
sns.kdeplot(df[(df['Price ($/MWh)'] > 0) & (df['Bid or Offer']=='Bid')]['Price ($/MWh)'], shade=True)