I have a dataset history of temperature/weather by city like this:
{"city": "Barcelona", "date": "2016-10-16", "temperature": "13", "weather": "cloudy"}
{"city": "Berlin", "date": "2016-10-16", "temperature": "-1", "weather": "sunny"}
{"city": "Pekin", "date": "2016-10-16", "temperature": "19", "weather": "cloudy"}
{"city": "Paris", "date": "2016-10-16", "temperature": "-8", "weather": "sunny"}
And I would like to create a top 5, order by best average temperature.In this result I would like to know the number of day by weather type(sunny cloudy rainy)
Example:
Rank - City - Average Temperature - Cloudy days - Sunny days - Rainy Days
1 - Barcelona - 20 - 93 - 298 - 29
How can doing this in Python?
Thanks
Matt
I believe you need pandas:
read_json
for DataFrame
from json
groupby
with aggregate mean
and nlargest
boolean indexing
count
s, reshape by unstack
reindex
by index
of s
for correct orderinginsert
with map
range
import pandas as pd
import pandas as pd
df = pd.read_json('a.json', lines=True)
print (df)
city date temperature weather
0 Barcelona 2016-10-16 13 cloudy
1 Berlin 2016-10-16 -1 sunny
2 Pekin 2016-10-16 19 cloudy
3 Paris 2016-10-16 -8 sunny
s = df.groupby(['city'])['temperature'].mean().nlargest(5)
print (s)
city
Pekin 19
Barcelona 13
Berlin -1
Paris -8
Name: temperature, dtype: int64
df2 = (df[df['city'].isin(s.index)]
.groupby(['city', 'weather'])['temperature']
.size()
.unstack(fill_value=0)
.add_suffix(' days')
.reindex(s.index)
.reset_index()
.rename_axis(None, axis=1))
df2.insert(1, 'temp avg', df2['city'].map(s))
df2.insert(0, 'rank', range(1, len(df2) + 1))
print (df2)
rank city temp avg cloudy days sunny days
0 1 Pekin 19 1 0
1 2 Barcelona 13 1 0
2 3 Berlin -1 0 1
3 4 Paris -8 0 1