I have created a webpage that will extract data from a database and and show it as a line chart using streamlit. I recently found out that i can cache the repetitive data using @st.cache_data
, but I don't understand what the proper syntax should be in my specific code such that it doesnt throw any errors. Below is the way i have tried to implement it but it is throwing errors related to plotly which has only started to occur after I introduced the st.cache operation to the code so i think it has something to do with the way my code is handling the datafr_creator()
. Please assist what should be the proper way to cache a dataframe? I couldn't find other implementations where a for loop is used.
#initialize dataframe
df = pd.DataFrame()
@st.cache_data(ttl=120)
def datafr_creator():
global df
for row in supabaseList:
row["created_at"] = row["created_at"].split(".")[0]
row["time"] = row["created_at"].split("T")[1]
row["date"] = row["created_at"].split("T")[0]
row["DateTime"] = row["created_at"]
df = df.append(row, ignore_index=True)
return (df)
datafr_creator()
#creating local list
#Display section
orignal_title = '<h1 style="font-family:Helvetica; color:Black; font-size: 45px; text-align: center">Tridev Water Monitoring System</p>'
st.markdown(orignal_title, unsafe_allow_html=True)
st.text("")
fig = px.area(df, x="DateTime", y="water_level", title='',markers=False)
fig.update_layout(
title={
'text': "Water level in %",
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'})
fig.update_layout(yaxis_range = [0,120])
fig.update_layout(xaxis_range = custom_range)
#Add Horizontal line in plotly chart for pump trigger level
fig.add_hline(y=80, line_width=3, line_color="black",
annotation_text="Pump Start Level",
annotation_position="top left",
annotation_font_size=15,
annotation_font_color="black"
)
#Final Chart print
st.plotly_chart(fig,use_container_width=True)
It is currently throwing this error:
ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [] but received: DateTime
I want to cache the dataframe so that on every run the code doesn't have to fetch data from the database, then process it inside the for-loop. All of which is time consuming.
The problem comes from the fact that df
does not contain any column named "DateTime"
here: fig = px.area(df, x="DateTime", y="water_level", title='',markers=False)
.
It means that for some reason, your dataframe must be empty.
Moreover, you shouldn't use global
with the cache. It is not necessary. Simply call the datafr_creator
function with the cache decorator and then, retrieve its result as shown in the Streamlit docs.
It gives:
import pandas as pd
import streamlit as st
@st.cache_data(ttl=120)
def datafr_creator():
df = pd.DataFrame()
for row in supabaseList:
row["created_at"] = row["created_at"].split(".")[0]
row["time"] = row["created_at"].split("T")[1]
row["date"] = row["created_at"].split("T")[0]
row["DateTime"] = row["created_at"]
df = df.append(row, ignore_index=True)
return df
# Here, it will call datafr_creator and use the cache if the
# dataframe has already been generated.
df = datafr_creator()
orignal_title = '<h1 style="font-family:Helvetica; color:Black; font-size: 45px; text-align: center">Tridev Water Monitoring System</p>'
st.markdown(orignal_title, unsafe_allow_html=True)
st.text("")
fig = px.area(df, x="DateTime", y="water_level", title='', markers=False)
fig.update_layout(
title={
'text': "Water level in %",
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
}
)
fig.update_layout(yaxis_range=[0, 120])
fig.update_layout(xaxis_range=custom_range)
# Add Horizontal line in plotly chart for pump trigger level
fig.add_hline(
y=80, line_width=3, line_color="black",
annotation_text="Pump Start Level",
annotation_position="top left",
annotation_font_size=15,
annotation_font_color="black"
)
# Final Chart print
st.plotly_chart(fig,use_container_width=True)