Search code examples
pythonjsonserializationpython-c-apiujson

Manipulate JSON without deserializing it


An aiohttp application fetches a JSON from external resource and need to use it to perform another request passing the JSON as the request body.

To avoid serialization/deserialization overhead ujson is used and then the JSON object is just passed to be used in the subsequent request without ever loading or dumping. This works but the JSON cannot be manipulated this way, just forwarded.

Probably there is no way to manipulate it without deserializing it but since ujson is used, the object is first deserialized as a C object. Having that in mind, is there a way to keep manipulating this object at C level without ever bringing it as a Python dict. An example operation would be del keys from the JSON or creating a new JSON with just a subset of the original JSON. Or checking if a given key exists in this JSON.


Solution

  • This might help you out: https://github.com/lemire/simdjson

    I don't completely understand the use case, but it's a lib that aims to

    We provide a fast parser, that fully validates an input according to various specifications. The parser builds a useful immutable (read-only) DOM (document-object model) which can be later accessed.

    It's a bit specific, it requires CPUs with certain technologies and specific compilers, but seems to me it could fit your use case.

    It also has wrappers for other languages, including python.