Search code examples
serializationlibtorch

Loading/Saving a map<map, torch::Tensor>> object in LibTorch for fast read/write


My use case is I have a C++ object of type map<string, map<string, torch::Tensor>> which I want to serialize, with two functions

#include <torch/torch.h>

using namespace std;

void save_tensor_map(map<string, map<string, torch::Tensor>> m, string fp) {
    // 
}

map<string, map<string, torch::Tensor>> read_tensor_map(string fp) {
    //
}

What is the simplest way to do this?


Solution

  • Here would be my attempt at writing your map in a file. I think you can deduce the read function from it. I don't have a compiler at hand right now to test it, please tell me if it raises issues.

    void save_tensor_map(const std::map<std::string, torch::Tensor>& map, const std::string& filename) {
      auto out_file = std::fstream(filename, std::ios::out | std::ios::binary);
    
      for(auto itr = map.begin(); itr != map.end(); ++itr) {
        // writing the key
        auto key = itr->first;
        size_t size = key.size();
        out_file.write((char*)&size, sizeof(size));
        out_file.write(&key[0], size);    
    
        // Writing tensor metadata
        auto tensor = itr->second;
        const at::IntArrayRef& sizes = tensor.sizes();
        int64_t nb_dims = sizes.size();
        out_file.write((char*)&nb_dims, sizeof(nb_dims));
        out_file.write((char*)&sizes[0], sizeof(long)*nb_dims);
    
        int64_t scalar_type = static_cast<int64_t>(tensor.scalar_type());
        out_file.write((char*)&scalar_type, sizeof(scalar_type));
    
        int64_t elem_size = tensor.element_size();
        out_file.write((char*)&elem_size, sizeof(elem_size));
    
        // writing tensor data
        out_file.write((char*)tensor.data_ptr(), elem_size*tensor.numel());
      }
    }
     
    

    In the read function you'll probably need to call torch::from_blob(void* data_ptr, const at::IntArrayRef& tensor_sizes, const at::TensorOptions& options) -> torch::Tensor to deserialize the tensor, but otherwise it's the same structure

    Edit : Just realized you can also make the tensor serialization much simpler with the save and load which convert to/from a stringstream (which is easy to read/write itself). See there