Search code examples
pythonplotlarge-dataholoviews

Radial heatmap from similarity matrix in Python


Summary

I have a 2880x2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?

Details

I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 ... 22:00, 23:00).

These were about 28-31 cells long, and each cell had the measurement of the thing I'm trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24x12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880x2880 matrix with similarity coefficients.

I'm trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions: enter image description here

I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It's just blank. I've added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.

import sys
sys.setrecursionlimit(10000)

import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc

# Function creating dummy data for this example
def transformer():
    dimension = 2880
    dummy_matrix = ([[ random.random() for i in range(dimension)  ] for j in range(dimension)]) #Fake, similar data

    col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
    row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
    val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
    idx_vals = [i for i in range(dimension*dimension)] # Placeholder

    return idx_vals, val_vals, row_vals, col_vals

idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)

hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))

gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))

I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it's possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn't have that datashade-feature.

So I have no idea how to tackle this. I would be content with a broad overview too, I don't need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I'm open to any solution really.


Solution

  • I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:

    1. The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
    2. Near the center of the radial heatmap, the color points will be too dense to be understood by the human.

    The following is a simple code to show a heatmap.

    import matplotlib.cm
    import matplotlib.pyplot as plt
    from matplotlib.colors import Normalize
    import numpy as np
    
    n = 2880
    m = 2880
    dummy_matrix = np.random.rand(m, n)
    
    fig = plt.figure(figsize=(50,50))  # change the figsize to control the resolution
    ax = fig.add_subplot(111)
    cmap = matplotlib.cm.get_cmap("Blues")  # you may use other build-in colormap or define you own colormap
    # if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
    norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
    image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
    plt.colorbar(image)
    
    plt.show()
    

    Which gives: This result

    Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.