Search code examples
pythonlarge-data

Storing and reading large data files efficiently


I am working on a project where I have large input files which come from numerical solutions of pdes. The format of the data is as follows.

x \t y \t f(x,y)

For each value of y, we have several values of x, and the function value evaluated at each point. The size of the data I'm dealing with is about [-3, 5]x[-3, 5] in steps of 0.01 in each dimension, so the raw data file is pretty big (about 640,000 entries). Reading it into memory is also pretty time-taking because the tools I'm working on have to read multiple raw data files of this type at the same time.

I'm using Python.

Is there any way to store and read data like this efficiently in Python? The idea is to include a tool that massages these raw data files into something that can be read more efficiently. I'm currently working on interpolating the data and storing some coefficients (essentially replacing memory by computing time), but I'm sure there must be an easier way that helps both memory and time.

Thanks SOCommunity!

PS: I saw related questions in Java. I'm working entirely on Python here.


Solution

  • If you're using numpy (and you probably should be), numpy.save/numpy.savez and numpy.load should be able to handle this pretty easily.

    For example:

    import numpy as np
    xs = np.linspace(-3, 5, 800)
    ys = np.linspace(-3, 5, 800)
    f_vals = np.random.normal(size=(xs.size, ys.size))
    np.savez('the_file.npz', xs=xs, ys=ys, f=f_vals)
    

    is quite quick, and the resulting file is less than 5mb.