Search code examples
rustserdemsgpack

Can I send an RMPV `Value` back to rmp_serde for deserialization?


I have a large, somewhat complicated data structure that I can serialize and deserialize with serde and rmp-serde, but I find that deserialization is quite slow. I think this is because my data structure includes two rather large HashMaps. I don't know how efficiently rmp_serde::from_slice can create the HashMap -- will it initialize using .with_capacity or does it just create a HashMap and insert one-by-one? And besides, I've found that AHashMap gives me considerable performance improvements elsewhere, so I'm trying to avoid using the default HashMap.

I want to try deserializing with rmpv::decode::value::read_value, but I'd like to leave most of the deserialization to rmp_serde and only implement some deserialization myself given some Value. Is there a way to choose which pieces I manually deserialize?

Conceptually, what I'd like to do is something like:

let v = rmp::decode::read_value(&mut reader).unwrap();   // get some Value
let arr : &Vec<Value> = v.as_array().unwrap();           // v is known to be an array
let first_value : MyType = deserialize_manually(arr[0]); // I'll convert the Value myself
let second_value : AnotherType = arr[1].into();          // allow rmpv to convert Value for me

I'm currently using rmp-serde 0.14 and rmpv 0.4.7. The rmp_serde changelog and rmp_serde release page don't provide granular details on what's changed, so I have no reason yet to believe upgrading to the current (v0.15.4 as of writing this question) will provide any new capabilities.

I know that serde provides a deserialize_with attribute. Maybe this is the appropriate route to go, so alternately, my question would be: how can I use deserialize_with to deserialize a specific MsgPack field?


Solution

  • I was able to get this to work using deserialize-with. First, I had to annotate my struct:

    struct MyStruct {
        some_number: i32,
        #[serde(deserialize_with="de_first_value")]
        first_value : HashMap<i32, String>, // T1 & T2 don't matter
        second_value : AnotherType,
    }
    

    Then I create this function that will drive deserialization. Because I'm deserializing a HashMap, I follow serde's Implement Deserialize for a custom map type:

    fn de_first_value<'de, D>(deserializer: D) -> Result<HashMap<i32, String>, D::Error>
    where
        D: serde::de::Deserializer<'de>,
    {
        deserializer.deserialize_byte_buf(MyHmVisitor)
    }
    

    Then I define MyHmVisitor and implement the Visitor trait. For deserializing a HashMap, I have to implement the visit_map function; I assume that I could do a similar deserialization for other types in this way by implementing the default provided methods (which all fail with a type error unless overridden):

    struct MyHmVisitor;
    
    impl<'de> serde::de::Visitor<'de> for MyHmVisitor {
        type Value = HashMap<i32, String>;
    
        /// "This is used in error messages. The message should complete the sentence
        /// 'This Visitor expects to receive ...'"
        /// https://docs.serde.rs/src/serde/de/mod.rs.html#1270
        fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
            write!(formatter, "a HashMap<i32, String>")
        }
    
        fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
        where
            A: serde::de::MapAccess<'de>,
        {
            // extract the size hint from the serialized map. If it doesn't exist, default to 0
            let capacity = map.size_hint().unwrap_or(0);
    
            let mut hm = HashMap::with_capacity(capacity);
    
            while let Some((k, v)) = map.next_entry()? {
                hm.insert(k,v);
            }
    
            hm
        }
    }