Search code examples
pythonjupyter-notebookout-of-memorydata-visualizationmayavi

What could I do to build the 3D Bar Chart on my machine using Mayavi?


Want to build a 3D Bar Chart using Mayavi (on my Asus Laptop Intel CoreTM i7-4510U CPU @ 2.00 GHz with 8 GBs de RAM, Windows 10) using a Jupyter Notebook (on a Python virtualenv) but I'm getting a grey screen.

Once the data was imported, I clicked in New > Python 3 and wrote

Mayavi build 3D bar chart

Used pandas' fast CSV parser, pandas.read_csv(), and once I ran line 4, I could see the memory usage increase to 88% of the capable using CleanMem Mini Monitor and got results in less than 1 minute.

Then, to build the bar chart

df1=df[[0]]
df2=df[[1]]
df3=df[[2]]
mlab.barchart(df1,df2,df3)

Unfortunately, I got this MemoryError

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-6-9736b00b5abc> in <module>
      2 df2=df[[1]]
      3 df3=df[[2]]
----> 4 mlab.barchart(df1,df2,df3)

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in the_function(*args, **kwargs)
     35 
     36     def the_function(*args, **kwargs):
---> 37         return pipeline(*args, **kwargs)
     38 
     39     if hasattr(pipeline, 'doc'):

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in __call__(self, *args, **kwargs)
     80             scene.disable_render = True
     81         # Then call the real logic
---> 82         output = self.__call_internal__(*args, **kwargs)
     83         # And re-enable the rendering, if needed.
     84         if scene is not None:

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in __call_internal__(self, *args, **kwargs)
   1093         """ Override the call to be able to scale automatically the axis.
   1094         """
-> 1095         g = Pipeline.__call_internal__(self, *args, **kwargs)
   1096         gs = g.glyph.glyph_source
   1097         # Use a cube source for glyphs.

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\helper_functions.py in __call_internal__(self, *args, **kwargs)
     90         the last object created by the pipeline."""
     91         self.store_kwargs(kwargs)
---> 92         self.source = self._source_function(*args, **kwargs)
     93         # Copy the pipeline so as not to modify it for the next call
     94         self.pipeline = self._pipeline[:]

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\sources.py in vertical_vectors_source(*args, **kwargs)
   1356 
   1357     data_source = MVerticalGlyphSource()
-> 1358     data_source.reset(x=x, y=y, z=z, scalars=s)
   1359 
   1360     name = kwargs.pop('name', 'VerticalVectorsSource')

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\sources.py in reset(self, **traits)
    306                 traits['u'] = traits['v'] = np.ones_like(s),
    307                 traits['w'] = s
--> 308         super(MVerticalGlyphSource, self).reset(**traits)
    309 
    310     def _scalars_changed(self, s):

c:\infovis\virtualenvs\dev\lib\site-packages\mayavi\tools\sources.py in reset(self, **traits)
    172 
    173         else:
--> 174             points = np.c_[x.ravel(), y.ravel(), z.ravel()].ravel()
    175             points.shape = (-1, 3)
    176             self.trait_set(points=points, trait_change_notify=False)

c:\infovis\virtualenvs\dev\lib\site-packages\numpy\lib\index_tricks.py in __getitem__(self, key)
    404                 objs[k] = objs[k].astype(final_dtype)
    405 
--> 406         res = self.concatenate(tuple(objs), axis=axis)
    407 
    408         if matrix:

<__array_function__ internals> in concatenate(*args, **kwargs)

MemoryError: Unable to allocate array with shape (153543233, 3) and data type int64

And the result was this

Result


Solution

  • Due to constantly being out-of-memory I had to come up with a way to reduce the amount of data.

    Inspired in Trifacta, I've decided to go with sampling (create a sample from the CSV file). The following are some of the possible samples I could product

    Sampling

    For simplification reasons, decided to go with random samples. Using Git Bash on Windows 10 I just ran a similar command (the number of rows might not be the same as the one used) as

    shuf -n 10000 BIGFILE.csv > SAMPLEFILE.csv
    

    Then the procedure to create the visualization was exactly the same except the name of the file and the result was the following

    Mayavi 3D Bar Chart

    Mayavi 3D Bar Chart

    Mayavi 3D Bar Chart