Search code examples
rustserde

How to deserialize and transcode arbitrary value type in Rust?


I am writing a proxy to transfer MessagePack to HTTP protocol (using Rocket framework).

Since MessagePack is binary protocol and hard to show here, I display it in equivalent json format bellow.

The input

  • if error: {"error":{"label":123, "message":"invalid args"}}, then I response HTTP_400 or HTTP_500 depending on the error.label field. This part is easy.

  • if good: {"result": … }, where the ... mean arbitrary value, which may be a string, an array or an arbitrary object. Then I response HTTP_200, with the body is the value of result in JSON format. I do not know how to do this part.

Is Manually implementing Deserialize for a struct the right answer where I should look into? But the example is not arbitrary value type.


Sorry that I did not explain my question clearly. I try to add some more explanation here.

I know how to deserialize fixed value type: to define a corresponding struct/enum and define #[derive(Deserialize)] on it.

I know how to transcode totally un-fixed value type: to use serde-transcode crate.

But now I need to deserialize part of the input: to deserialize the error field if any; but transcode the result field from MsgPack to Json, without knowing it's definition.


Solution

  • This isn't my first answer like this, but smuggling data into a Deserializer always ends up extremely verbose for me.

    So first some theory: You need a Deserializer per "data type" - JSON value of decided shape and level inside your data. In your case, there's three:

    • The top level {"error": …} or {"result": …} needs custom enum deserializer
    • The in error: {"label":123, "message":"invalid args"} - This deserializer is easy, it can be derived.
    • The in result needs a deserializer that actually calls serde_transcode

    Two deserializers need to be written by hand. For the top level deserializer, our goal is just to recognize whether we have error or result, and smuggle some data into the result deserializer. The data is going to be the target serializer S to place the result data in. The method of choice for smuggling data through deserializers is DeserializeSeed. This requires a whole bunch of boilerplate, the only "decision" we make in this code is to call deserialize_map (because the top level value is a { … }):

    struct TranscodingDeserializeMessageSeed<S> {
        target: S,
    }
    
    impl<'de, S: Serializer> DeserializeSeed<'de> for TranscodingDeserializeMessageSeed<S> {
        type Value = Result<(), Error>;
    
        fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
        where
            D: serde::Deserializer<'de>,
        {
            struct V<S> {
                target: S,
            }
            impl<'de, S: Serializer> Visitor<'de> for V<S> {
                // ...
            }
            deserializer.deserialize_map(V {
                target: self.target,
            })
         }
    }
    

    The actual magic of deciding result or error happens inside the visitor implementation. If it is result, we hand S to the second manually implemented DeserializeSeed with next_value_seed:

    fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
    where
        A: serde::de::MapAccess<'de>,
    {
        match map.next_key()? {
            Some("error") => Ok(Err(map.next_value()?)),
            Some("result") => map
                .next_value_seed(TranscodeSeed {
                    target: self.target,
                })
                .map(Ok),
            _ => Err(serde::de::Error::custom("Expected key error or result")),
        }
    }
    

    Now, we can finally hand of to serde_transcode:

    struct TranscodeSeed<S> {
        target: S,
    }
    
    impl<'de, S: Serializer> DeserializeSeed<'de> for TranscodeSeed<S> {
        type Value = ();
    
        fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
        where
            D: serde::Deserializer<'de>,
        {
            match serde_transcode::transcode(deserializer, self.target) {
                Ok(_) => Ok(()),
                Err(e) => Err(serde::de::Error::custom(format!(
                    "Error while transcoding: {}",
                    e
                ))),
            }
        }
    }
    

    The final piece of the puzzle is actually invoking the outer DeserializeSeed:

    let mut extracted = vec![]; // I'll write to a vec, but you can use any (sync) writer, so you don't need to keep the entire message in memory.
    let mut extract = serde_json::Serializer::new(&mut extracted);
    let res = DeserializeSeed::deserialize(
        TranscodingDeserializeMessageSeed {
            target: &mut extract,
        },
        &mut de, // Assuming de is your MsgPack deserializer
    );
    

    Explorer

    And this brings me to a question: Are you sure this is viable for you? It is guaranteed that no data is written to S if the message type is error or some other invalid type, even if it might feel a bit icky to construct a serializer and then rely on nothing being written through it. The problem is that if your MsgPack message starts out as a valid result message and then turns out to be invalid in the middle, you'll likewise produce a HTTP 200 with a JSON message cut in the middle (and deserialize returning Err).