I'm exporting datasets from equipment logging software and am trying to use Bokeh (Python) as an interactive visual aide during analysis. Everything is working fine, except for the date/time which refuses to be imported in its current format (24/08/2022 01:40:32). I have data for every second for at least a month's worth (So dropping the date wouldn't work).
I've been playing about with Bokeh for a while now by simply ignoring the date/time and replacing it by a consecutive series (1,2,3...) and plotting it as such, but the time has come to fix my temporary solution and I just cant seem to figure out how to define the formatting or how to convert it. (Bokeh documentation)
Example code:
from bokeh.io import output_file, show # OUTPUT_FILE FOR EXPORT (NOT USED)
from bokeh.layouts import gridplot # MULTIPLOT
from bokeh.plotting import figure
from bokeh.palettes import Spectral4 # COLOUR PALETTE
import pandas as pd
import external_tags as tags # TAG DEFINITIONS USED FOR CSV IMPORTING
# import csv
df = pd.read_csv("AUGUST_PS_1MIN.csv") # testset with 1 min intervals
# TOOLS
TOOLS = "box_zoom, box_select, crosshair, reset, hover"
Figure_Title = "TESTING AUTOMATING IMPORT WITHOUT MANUAL TWEAKING"
line_width = 1.5
alpha = 1
height = 500
x = df[tags.Date_Time_UTC[0]]
# These just redirect to my imported tag definitions TAG = ["column name", "friendly name"]
fig1a = tags.PS_MH_LOAD
fig1b = tags.PS_MH_WINCH_PWR
fig1c = tags.PS_PWR_MSB1
fig1d = tags.PS_PWR_MSB2
# FIGURE A (TOP LEFT)
s1 = figure(sizing_mode="stretch_width", height=height, title="LOAD", tools=TOOLS, x_axis_type='datetime')
s1.line(x, df[fig1a[0]], color=Spectral4[0], alpha=alpha, line_width=line_width, legend_label=fig1a[1])
s1.line(x, df[fig1b[0]], color=Spectral4[1], alpha=alpha, line_width=line_width, legend_label=fig1b[1])
s1.line(x, df[fig1c[0]], color=Spectral4[2], alpha=alpha, line_width=line_width, legend_label=fig1c[1])
s1.line(x, df[fig1d[0]], color=Spectral4[3], alpha=alpha, line_width=line_width, legend_label=fig1d[1])
#### some repetitive code has been omitted here for brevity
# Define the grid
# p = gridplot([[s1, s2],[s3, s4]])
# show the results
show(s1)
Example of a dataset
2022-08-26 04:03:52.000,0,30,30,894.70751953125,-63.785041809082,-0.497732371091843,2.14258599281311,0.0307948496192694,355.496154785156,0,0,0,2.38387619901914E-05,0,102.844131469727,0.040388036519289,0.703329265117645,0,0,0.0244150012731552,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106834815815091
2022-08-26 04:03:53.000,0,30,30,895.21142578125,-63.6380615234375,-0.550026297569275,2.14223098754883,0.0307948496192694,355.496154785156,0,0,0,1.45306594276917E-05,0,102.827079772949,0.0610153041779995,0.733967423439026,0,0,0.0245136469602585,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106870988383889
2022-08-26 04:03:54.000,0,30,30,895.726196289063,-63.6465072631836,-0.533430516719818,2.1423876285553,0.0307948496192694,355.496154785156,0,0,0,8.71746851771604E-06,0,102.834602355957,0.0816425681114197,0.764605581760406,0,0,0.0246122926473618,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106907160952687
2022-08-26 04:03:55.000,0,30,30,896.1552734375,-63.0882987976074,-0.534056782722473,2.14190745353699,0.0307948496192694,355.496154785156,0,0,0,5.21722904522903E-06,0,102.811561584473,0.10226983577013,0.795243740081787,0,0,0.024710938334465,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106943333521485
2022-08-26 04:03:56.000,0,30,30,895.727600097656,-63.0707931518555,-0.515181064605713,2.14224052429199,0.0307948496192694,355.496154785156,0,0,0,3.12787688017124E-06,0,102.827545166016,0.122897103428841,0.825881898403168,0,0,0.0248095821589231,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106979506090283
2022-08-26 04:03:57.000,0,30,30,895.690246582031,-63.511173248291,-0.49309903383255,2.14326453208923,0.0307948496192694,355.496154785156,0,0,0,7.10703216100228E-06,0,102.876693725586,0.143524378538132,0.856520056724548,0,0,0.0249082278460264,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0107015678659081
Any help would be appreciated. :)
tl;dr: how do I import/use the date and time in Bokeh when the source is formatted as follows: "2022-08-26 04:03:57"
UPDATE I got it to be recognized as datetime! Still some kinks and formatting to figure out, but this is what did the trick for me:
x = df[tags.Date_Time_UTC[0]]
x = pd.to_datetime(x)
I also manually removed the trailing decimals from the seconds.
2022-08-26 04:03:56.000 -> 2022-08-26 04:03:56
Further answers and tips are, of course, welcome. But I can continue for now!
Thanks for the help!
Because you have imported pandas
the easiest way to parse a string to a datetime object is pd.to_datetime()
. This function can also parse multiple formats using %f-string notation.
For example
pd.to_datetime('2022-01-01', format='%Y-%m-%d')
and
pd.to_datetime("01/01/2022 00:00:00", format='%d/%m/%Y %H:%M:%S')
will both result in the same datetime object.
If you want to parse a complete column of a pandas DataFrame you could use the .iloc
method. Let's say you want to parse the first column (zero based index).
df.iloc[:,0] = pd.to_datetime(df.iloc[:,0], format="%Y-%m-%d")
should work.
The example below is copied from here and if you want to read the bokeh tutorial, there is one which shows how to enable datetime axes.
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
sample={'A':[pd.to_datetime(x, format='%Y-%m') for x in ['2012-01','2012-02','2012-03']],'B':[7,8,9]}
source = ColumnDataSource(sample)
p = figure(width=400, height=400, x_axis_type='datetime')
p.line(x='A', y='B', source=source, line_width=2)
output_notebook()
show(p)
FYI: The function pd.read_csv()
has a argument parse_dates
which calls pd.to_datetime
while parsing the csv-file. But there are multiple options and the usage depends on the data. So you have to read the documentation because this would make this post really long.