I have data for a whole year with an interval of fifteen minutes and want to create a histogram counting hours and not fifteen minutes.
I have following toy example code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv(r"D:/tmp/load.csv")
df.hist(bins=range(20,80,5))
sns.set()
plt.xlabel("Value")
plt.ylabel("count")
plt.show()
Which produces the following graph.
The data within the DataFrame is of the form:
>>> df[(df["Time"] > "2021-04-10 19:45:00") & (df["Time"] < "2021-04-10 21:00:00")]
Time tag
9584 2021-04-10 20:00:00 50.3840
9585 2021-04-10 20:15:00 37.8332
9586 2021-04-10 20:30:00 36.6808
9587 2021-04-10 20:45:00 37.1840
I need to change the y-axis values on the histogram so as to see the hours count and not the fifteen minutes count. So for the first column I should see 10 (40/4) instead of 40. So the whole y-axis should be divided by 4.
How can I perform the scaling of the y-axis in the histogram?
Should I work with plt.yticks
function somehow?
Here is my take on your interesting question.
I don't know of a way to rescale the y-axis after having plotted the dataframe, but you can rescale the dataframe itself.
For instance, in the following toy dataframe, with an interval of measure of 15 minutes, 9 values are comprised between 35 and 40:
import pandas as pd
df = pd.DataFrame(
{
"index": [
"2021-04-10 20:00:00",
"2021-04-10 20:15:00",
"2021-04-10 20:30:00",
"2021-04-10 20:45:00",
"2021-04-10 21:00:00",
"2021-04-10 21:15:00",
"2021-04-10 21:30:00",
"2021-04-10 21:45:00",
"2021-04-10 22:00:00",
"2021-04-11 22:15:00",
"2021-04-11 22:30:00",
"2021-04-11 22:45:00",
"2021-04-11 23:00:00",
"2021-04-11 23:15:00",
"2021-04-11 23:30:00",
"2021-04-11 23:45:00",
],
"tag": [39, 36, 36, 37, 42, 28, 39, 54, 43, 38, 39, 36, 44, 27, 38, 28],
},
)
df["index"] = pd.to_datetime(df["index"], format="%Y-%m-%d %H:%M:%S")
Here is the corresponding plot:
df.copy().set_index("index").plot(
kind="hist", bins=range(20, 80, 5), yticks=range(0, 10), grid=True
)
Had the measurement been hourly based, 4 values would have been found in the 35-40 bin:
So, rescaling the dataframe hourly suppose to:
_ = (
df.assign(
bin=pd.cut(df["tag"], bins=range(20, 60, 5)),
date=df["index"].dt.date,
hour=df["index"].dt.hour,
)
.sort_values(by=["bin", "date", "hour"])
.drop_duplicates(subset=["bin", "date", "hour"], keep="first")
.drop(columns=["bin", "date", "hour"])
.set_index("index")
.plot(kind="hist", bins=range(20, 80, 5), yticks=range(0, 5), grid=True)
)
Which outputs: