Search code examples
python-3.xmatplotlibplotlyplotly-python

How to visualize through time a data that is quantitative or qualitative, but never both?


I have a csv file where some rows have words, and some have numbers. Something like :

         column1   column2  column3 
date      2019      2020      2021
color     blue      blue     yellow
velocity    1        22        3 
power       4        2         1 

And I need to visualize it in a plot that allows me to visually search for patterns between all the attributes (color, velocity, power) trough time. But all plotly plots I've found only allow me to track quantitative or qualitative values alone, not together. I cant even visualize in my head how a plot would have to be to allow this ... The only way I can think of, is to transpose each qualitative value to a new row, and add an arbitrary and constant number to it, like :

       column1   column2  column3 
date      2019      2020      2021
blue       100       100       0  
yellow      0        0        100
velocity    1        22        3 
power       4        2         1 

So in a line plot, for example, there would be a straight line at the top indicating which qualitative value is happening, while all others would be at the bottom. And I guess it could be colored according to the row index(date, color, velocity, power), allowing me to identify it visually. But I'm quite sure there is a better way.

Any plot library is acceptable, although plotly is preferential, because its easy.


Solution

    • Your data as presented is four factors. So you can use a scatter for x,y,size & color
    • color is an obvious choice for qualitative. Others can also be qualitative by using categorical data types as well.
    import io
    import pandas as pd
    import plotly.express as px
    
    df = pd.read_csv(io.StringIO("""         column1   column2  column3 
    date      2019      2020      2021
    color     blue      blue     yellow
    velocity    1        22        3 
    power       4        2         1  """), sep="\s+").T
    
    df["date"] = df["date"].astype(int)
    df["velocity"] = df["velocity"].astype(int)
    df["power"] = df["power"].astype(int)
    
    fig = px.scatter(df, x="date", y="velocity", color="color", size="power")
    fig.update_layout(xaxis={"tickformat":"d"})
    

    enter image description here