Search code examples
pythonplotly

plot time for scatter chart in log scale - plotly


I've got a wide range of values in seconds that I want to display in log scale. It works fine if i plot the raw data in seconds. But I'm trying to convert the y-axis to timestamps and keep the log scale.

df = pd.DataFrame({
    'Position':(1,2,3,4,5,6,7,8,9),
    'Value':(1,20,821,2300,4500,30000,405600,1023764,11256400),
})

# works
fig = px.scatter(x=df['Position'], y=df['Value'], log_y="True")

When trying to change the y-axis to datetime, the values aren't correct. When trying to insert the log scale, the values don;t appear at all.

fig = px.scatter(x=df['Position'], y=pd.to_datetime(df['Value'], unit = 's'))
fig = px.scatter(x=df['Position'], y=pd.to_datetime(df['Value'], unit = 's'), log_y="True")

The output y-axis should range from 0 to 130 days, 6 hours, 46 mins, 40 secs. I'm not fussed about being specific here, the broad range is fine.


Solution

  • The issue is that when you convert to datetime objects, Plotly can't apply a logarithmic scale directly to datetime objects.

    To achieve this you can try not actually converting the data to datetime objects, which would break the log scale. Instead, we're keeping the numeric values (sec) for plotting, but displaying time-formatted labels.

    import pandas as pd
    import plotly.express as px
    import plotly.graph_objects as go
    import numpy as np
    from datetime import timedelta
    
    # Your sample data
    df = pd.DataFrame({
        'Position': (1, 2, 3, 4, 5, 6, 7, 8, 9),
        'Value': (1, 20, 821, 2300, 4500, 30000, 405600, 1023764, 11256400),
    })
    
    # Create a figure with numeric values on a log scale
    fig = px.scatter(x=df['Position'], y=df['Value'], log_y=True)
    
    # Create custom time-formatted tick labels
    # Get the current y-axis tick values (these are in log scale)
    y_tick_vals = fig.layout.yaxis.tickvals
    
    # If tickvals not automatically set, we can define our own
    if y_tick_vals is None:
        # Create tick values at each order of magnitude
        powers = np.floor(np.log10(df['Value'].min())) - 1
        powers_max = np.ceil(np.log10(df['Value'].max())) + 1
        powers_range = np.arange(powers, powers_max)
        y_tick_vals = [10**p for p in powers_range]
    
    # Create time-formatted labels for each tick value
    y_tick_text = []
    for val in y_tick_vals:
        if val > 0:  # Avoid negative or zero values
            # Convert seconds to a readable time format
            delta = timedelta(seconds=val)
            days = delta.days
            hours, remainder = divmod(delta.seconds, 3600)
            minutes, seconds = divmod(remainder, 60)
            
            if days > 0:
                time_str = f"{days}d {hours}h {minutes}m"
            elif hours > 0:
                time_str = f"{hours}h {minutes}m {seconds}s"
            elif minutes > 0:
                time_str = f"{minutes}m {seconds}s"
            else:
                time_str = f"{seconds}s"
                
            y_tick_text.append(time_str)
        else:
            y_tick_text.append("0s")
    
    # Update the y-axis with custom formatted time labels
    fig.update_layout(
        yaxis=dict(
            tickmode='array',
            tickvals=y_tick_vals,
            ticktext=y_tick_text,
            title="Time (log scale)"
        ),
        xaxis_title="Position"
    )
    
    fig.show()
    

    The graph

    enter image description here