Search code examples
pythonbokehlatitude-longitude

Grouping data by (lat,long) values to determine color for plotting


I have a dataset with latitude and longitude data, as below. Some latitude-longitude coords are the same. I want to make an interactive bokeh dot plot where:

  • yellow dots show data points of lat-long coords that have only one dog
  • red dots for lat-long coords with >1 dog

My data:

Type Latitude Longitude

Dog  41.9595 82.494997
Dog  41.4388 82.493585
Dog  41.4388 82.493585
Dog  41.3848 82.493739
Dog  41.3838 82.383883
Dog  41.3848 82.493739
Dog  41.3828 82.383838
Dog  41.2747 82.474484
Dog  41.3838 82.393949
Dog  41.3883 82.373848
Dog  41.3828 82.383838

How do I do this in Python? This is my code so far, and the dots are all the same color. However, I want homes with more than one dog to be a different color.

from bokeh.plotting import figure, show, output_notebook
from bokeh.tile_providers import CARTODBPOSITRON
p = figure(x_axis_type="mercator", y_axis_type="mercator")
p.add_tile(CARTODBPOSITRON)

p.circle(x=Pet_Data['Latitude'],
         y=Pet_Data['Longitude'], 
         line_color="#FF0000", 
         fill_color="#FF0000",
         fill_alpha=0.05)

output_notebook()
show(p)

Solution

  • Here's some starter code to get you unstuck, but please post your own code in the question. Everyone uses pandas package for reading datasets:

    import pandas as pd      
    
    # Usually we do pd.read_csv('your.csv'), but since we don't have your CSV file here's code to make this example reproducible (MCVE)
    from io import StringIO
    
    df = """Type Latitude Longitude
    
    Dog  41.9595 82.494997
    Dog  41.4388 82.493585
    Dog  41.4388 82.493585
    Dog  41.3848 82.493739
    Dog  41.3838 82.383883
    Dog  41.3848 82.493739
    Dog  41.3828 82.383838
    Dog  41.2747 82.474484
    Dog  41.3838 82.393949
    Dog  41.3883 82.373848
    Dog  41.3828 82.383838"""
    
    df = pd.read_csv(StringIO(df), sep='\s+')
    

    Now you can aggregate your dataframe by (lat,long) and define a new column color to be whatever expression you want, i.e. use red dots for coords that have >1 dog (at the same Lat/Lon, else yellow:

    df.groupby(['Latitude','Longitude']).agg(lambda g: 'r' if g.size > 1 else 'y')
    

    That's a pandas groupby followed by an aggregate, which uses a lambda expression.

    Please read up on these and play around with df.groupby(['Latitude','Longitude']).agg(...) yourself.