Search code examples
pythonvincent

Why Python Vincent map visuzalization does not map data from Data Frame?


I am using Python vincent map visualization with the use of this package introductory examples. I work in ipython notebook.

I defined simple pandas DataFrame with country FIPS codes (taken from here). Then I tried to map DataFrame data with vincent map by these FIPS codes, but resulted visualization fails to colour countries in any manner. How can I make it work?

country_data_tmp = pd.DataFrame({'country_names' : np.array(['Argentina', 'Armenia', 'Australia', 'Austria']),
                                 'country_FIPS' : np.array(['032', '051', '036', '040']),
                                 'my_rate' : np.array([0.254, 0.3456, 0.26, 0.357])})
country_data_tmp.head()

enter image description here

world_topo = r'world-countries.topo.json'

geo_data = [{'name': 'countries',
             'url': world_topo,
             'feature': 'world-countries'}]

vis = vincent.Map(data=country_data_tmp, 
                  geo_data=geo_data, 
                  scale=1100, 
                  data_bind='my_rate', 
                  data_key='country_FIPS',
                  map_key={'counties': 'properties.FIPS'})

vis.display()

enter image description here


Solution

  • They don't display because you have not set the map_key correctly. The world_countries.topo.json file identifies the countries by 3 letter code, named id in that file (this corresponds to the field called alpha-3 in the page you linked to). You can see this if you look at the raw data in that json file.

    Also, you set 'name': 'countries' in geo_data, but in map_key you try to reference it as counties (note the missing r). Easy mistake to make, as it's counties in the example page where they're mapping US counties.

    If you change the variable names so that they reference non-empty fields - you'll get a lovely map as country_alpha3 in your data table matches id in the JSON variable countries.

    N.B. As your code stands, only the countries for which you have data will be plotted. You could add a layer with all country outlines as per the second example here if you want all outlined, but only the ones with data coloured. I've provided changes to the code to do that in the second code / output section below.

    N.B. 2 With your current values of my_rate the colour contrast is not very noticeable. Try it out with [0,0.3,0.7,1.0] to convince yourself it is colouring them differently.

    Code

    #Data setup bit - Input[1] from your notebook
    #Note new name for country code country_alpha3
    
    import pandas as pd
    import numpy as np
    
    country_data_tmp = pd.DataFrame({'country_names' : np.array(['Argentina', 'Armenia', 'Australia', 'Austria']),
                                     'country_alpha3' : np.array(['ARG','ARM','AUS','AUT']),
                                     'my_rate' : np.array([0.254, 0.3456, 0.26, 0.357])})
    country_data_tmp.head()
    
    #map drawing bit Input[2] from your notebook
    #Note the changes in variable names
    
    world_topo = r'world-countries.topo.json'
    
    geo_data = [{'name': 'countries',
                 'url': world_topo,
                 'feature': 'world-countries'}]
    
    vis = vincent.Map(data=country_data_tmp, 
                      geo_data=geo_data, 
                      scale=1100, 
                      data_bind='my_rate', 
                      data_key='country_alpha3',
                      map_key={'countries': 'id'})
    
    vis.display()
    

    Output

    Output of script with sample data

    Code with outline layer plus data layer (coloured for those with data):

    #Replace input[2] with this to add a layer with outline only
    
    world_topo = r'world-countries.topo.json'
    
    geo_data = [{'name': 'countries',
                 'url': world_topo,
                 'feature': 'world-countries'},
               {'name': 'countries_outline',
                 'url': world_topo,
                 'feature': 'world-countries'}]
    
    vis = vincent.Map(data=country_data_tmp, 
                      geo_data=geo_data, 
                      scale=100, 
                      data_bind='my_rate', 
                      data_key='country_alpha3',
                      map_key={'countries': 'id'})
    
    del vis.marks[1].properties.update
    vis.marks[1].properties.enter.stroke.value = '#000'
    
    vis.display()
    

    Output (output layer plus data layer)

    Image with countries outlined - those with data coloured