Search code examples
pythonpandasbokehblaze

Big Data with Blaza and Pandas


I want to know if this approach would be an overkill for a project. I have a 4gb file that obviously my computer cant handle. Would using Blaze to split the file into more manageable file sizes and open with pandas and visualize with Bokeh be an overkill?

I know Pandas has a "chunk" function but the reason i want to split them is because there are specific rows related to specific names that i need to analyze.

is there a different approach you would take that wont crash my laptop and doesnt require setting up Hadoop or any AWS service?


Solution

  • Pandas chunking with pd.read_csv(..., chunksize=...) works well.

    Alternatively dask.dataframe mimics the Pandas interface and handles the chunking for you.