Search code examples
pythonpandasdataframebokeh

Plot data from a row using column name as x axis in bokeh


I'm starting on a project where I want to create an interactive plot from this dataset:

this dataset

For now I'm just trying to plot the first row from the 2000 to 2012 columns, for that I use this :

import pandas as pd
from bokeh.io import output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.plotting import show

output_file('test.html')

df = pd.read_csv('Swedish_Population_Statistics.csv', encoding="ISO-8859-1")
df.dropna(inplace=True)  # Drop rows with missing attributes
df.drop_duplicates(inplace=True)  # Remove duplicates

# Drop all the column I don't use for now
df.drop(['region', 'marital_status', 'sex'], inplace=True, axis=1)

x = df.loc[[0]]

print(x)

Which gives me this dataframe

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
0 10406 10362 10322 10288 10336 10336 10429 10585 10608 10718 10860 11121 11288

Now I want to take the column names as x-axis and the row values as y-axis.

This is where I'm stuck.

I figure the code would look like this but can't figure what to put in x and y

x = df.columns.tolist() #Take columns names into a list
y = df.loc[[0]].values.tolist() # Take the first row
source = ColumnDataSource(x, y)

p = figure(title="Test")
p.line(x='x', y='y', source=source, line_color="blue", line_width=2)

I get this error :

BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('x', 13), ('y', 1)

I don't understand why the lengths are not the same as I used tolist() on both.

Any help would be very appreciated, I've been trying to find a solution for the past 3 hours with no success.


Solution

  • Okay so I found my problem, the main thing was that y was a 2-dimensional list but I needed a 1-d list. Which leads me to this this working code :

    output_file('test.html')
    
    df = pd.read_csv('Swedish_Population_Statistics.csv', encoding="ISO-8859-1")
    df.dropna(inplace=True)  # Drop rows with missing attributes
    df.drop_duplicates(inplace=True)  # Remove duplicates
    
    # Drop all the column I don't use for now
    df.drop(['region', 'marital_status', 'sex'], inplace=True, axis=1)
    
    x = df.columns.tolist()
    y = df.loc[[0]]
    temp = []
    temp2 = []
    
    # Append each value of the dataframe row in a 1-dimension list one by one
    
    for i in range(13):
        temp.append(y[str(2000+i)].tolist())
        temp2.append(temp[i][0])
    
    p = figure(title="Test", sizing_mode="scale_both")
    p.line(x, temp2, line_color="blue", line_width=2)
    p.circle(x, temp2, fill_color="white", size=8)
    
    show(p)
    

    With this result :

    Plot