I have a large, somewhat complicated data structure that I can serialize and deserialize with serde
and rmp-serde
, but I find that deserialization is quite slow. I think this is because my data structure includes two rather large HashMaps. I don't know how efficiently rmp_serde::from_slice
can create the HashMap -- will it initialize using .with_capacity
or does it just create a HashMap and insert one-by-one? And besides, I've found that AHashMap gives me considerable performance improvements elsewhere, so I'm trying to avoid using the default HashMap.
I want to try deserializing with rmpv::decode::value::read_value
, but I'd like to leave most of the deserialization to rmp_serde and only implement some deserialization myself given some Value
. Is there a way to choose which pieces I manually deserialize?
Conceptually, what I'd like to do is something like:
let v = rmp::decode::read_value(&mut reader).unwrap(); // get some Value
let arr : &Vec<Value> = v.as_array().unwrap(); // v is known to be an array
let first_value : MyType = deserialize_manually(arr[0]); // I'll convert the Value myself
let second_value : AnotherType = arr[1].into(); // allow rmpv to convert Value for me
I'm currently using rmp-serde 0.14 and rmpv 0.4.7. The rmp_serde changelog and rmp_serde release page don't provide granular details on what's changed, so I have no reason yet to believe upgrading to the current (v0.15.4 as of writing this question) will provide any new capabilities.
I know that serde provides a deserialize_with
attribute. Maybe this is the appropriate route to go, so alternately, my question would be: how can I use deserialize_with
to deserialize a specific MsgPack field?
I was able to get this to work using deserialize-with
. First, I had to annotate my struct:
struct MyStruct {
some_number: i32,
#[serde(deserialize_with="de_first_value")]
first_value : HashMap<i32, String>, // T1 & T2 don't matter
second_value : AnotherType,
}
Then I create this function that will drive deserialization. Because I'm deserializing a HashMap
, I follow serde's Implement Deserialize for a custom map type:
fn de_first_value<'de, D>(deserializer: D) -> Result<HashMap<i32, String>, D::Error>
where
D: serde::de::Deserializer<'de>,
{
deserializer.deserialize_byte_buf(MyHmVisitor)
}
Then I define MyHmVisitor
and implement the Visitor trait. For deserializing a HashMap, I have to implement the visit_map
function; I assume that I could do a similar deserialization for other types in this way by implementing the default provided methods (which all fail with a type error unless overridden):
struct MyHmVisitor;
impl<'de> serde::de::Visitor<'de> for MyHmVisitor {
type Value = HashMap<i32, String>;
/// "This is used in error messages. The message should complete the sentence
/// 'This Visitor expects to receive ...'"
/// https://docs.serde.rs/src/serde/de/mod.rs.html#1270
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(formatter, "a HashMap<i32, String>")
}
fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
where
A: serde::de::MapAccess<'de>,
{
// extract the size hint from the serialized map. If it doesn't exist, default to 0
let capacity = map.size_hint().unwrap_or(0);
let mut hm = HashMap::with_capacity(capacity);
while let Some((k, v)) = map.next_entry()? {
hm.insert(k,v);
}
hm
}
}