I am trying to use the visualize
method to visualize a Dask graph. However, the resulting image is too small (because there are a lot of nodes in my graph). How can I increase its size?
Here is the code:
from dask.diagnostics import ProgressBar
from matplotlib import pyplot as plt
df = dd.read_csv('nyc_parking_tickets_2017.csv')
missing_values = df.isnull().sum()
missing_count = ((missing_values / df.index.size) * 100)
missing_count.visualize()
This code is taken from Data Science with Python and Dask by Jesse Daniel. The dataset comes from this Kaggle dataset on NYC parking tickets.
Dask uses relatively sane defaults for graphviz. It's surprising that the image is small. However, if you want to modify the graph itself you can pass graph-level attributes to the visualize method (see the docstring). These will be passed to the GraphViz library.
You might also mean that the nodes in the graph are small, perhaps because there are very many of them. I don't recommend relying on the visualize method to gain insight if you have more than a few hundred partitions.