Search code examples
pythonc++pybind11

npz file as input in pybind11


I'm trying to wrap a npz file with pybind11. I have the following arguments in python, where the function pathlookup is in c++ :

import testpy_path
sourcefile1 = np.load('data1.npy')
sourcefile2 = np.load('data2.npz') 
testpy_path.pathlookup(sourcefile1, sourcefile2) //error at sourcefile2 

In C++ with pybind11, I'm trying to generate the numpy inputs sourcefile1 and sourcefile2 like this:

void pathlookup(py::array_t<double, py::array::c_style | py::array::forecast> sourcefile1, py::array_t<double, py::array::c_style | py::array::forecast> sourcefile2){
    std::vector<double> sources1(sourcefile1.size()); 
    std::memcpy(sources1.data(), sourcefile1.data(), sourcefile1.size() * sizeof(double)); 
}

It works fine with sourcefile1 the .npy file but it doesn't work with the numpy .npz file. My question is, what are the arguments needed in the function pathlookup c++ to use the npz file and how would I store the npz file into a vector?

Thank you


Solution

  • I am not very experienced with numpy, but this is what I found in the manual:

    When you use load() with npz files then numpy.lib.npyio.NpzFile instance is created, that's not an array instance. Here is important part from manual about NpzFile:

    A dictionary-like object with lazy-loading of files in the zipped archive provided on construction.

    NpzFile is used to load files in the NumPy .npz data archive format. It assumes that files in the archive have a .npy extension, other files are ignored.

    The arrays and file strings are lazily loaded on either getitem access using obj['key'] or attribute lookup using obj.f.key. A list of all files (without .npy extensions) can be obtained with obj.files and the ZipFile object itself using obj.zip.

    That means you access your arrays through:

    np.savez("out.npz", x=data)
    x = np.load("out.npz")['x']
    

    Then x can be passed to your function.

    https://www.kite.com/python/docs/numpy.lib.npyio.NpzFile

    Edit:

    If you wish to load numpy arrays directly through pybind you can do:

    auto np = py::module::import("numpy");
    py::dict d = np.attr("load")("out.npz");
    for(auto k : d)
    {
        std::cout << k.first.cast<std::string>() << std::endl;
        std::cout << k.second.cast<py::array>().size() << std::endl;
    }
    

    or pass npz file handle as dict.