Search code examples
rustserde

Using Box to optimise memory allocation of optional, known length arrays


I have a question about memory allocation strategies in a specific situation.

I have a large number of structs that are coming from a JSON API via serde. There is a specific field in the JSON response which will, for a small minority of cases, exist and contain an array of exactly 11 elements. In the majority of cases this field will not exist at all in the response. All responses need to be stored.

The majority of responses will be of the format:

{
  id: 1,
  event: "None"
}

A minority of responses will be of the format:

{
  id: 2,
  event: "something",
  event_details : [11 json objects here]
}

If I define my struct for parsing the JSON as:

#[derive(Deserialize, Debug)]
struct Event {
  id: u32,
  event: String,
  // EventDetail is a struct with an u32 and string field
  events: Option<[EventDetail; 11]>
 }

I can see using deepsizeof crate that my Vec<Event> that gets created takes up ~500MB of memory. If I change the events field to be events: Option<Box<[EventDetail; 11]>> the memory usage drops to ~150MB.

My understanding here is that since Box is allocated at runtime instead of compile time, when a JSON response comes in that is lacking the events field, then the 11 item array does not need to be allocated. Am I correct here? And is this a valid use case for Box or is there a better way to do this?

I also tried events: Option<Vec<EventDetail>>, which resulted in slightly higher memory assignment than Box. I assume this is down to padding.


Solution

  • Yes, this is all correct. Option<[T; N]> always allocates space for the array, even if it's not used, whereas Option<Box<...>> allocates enough space for the box pointer, but not the thing it points to unless needed. Hiding the large data behind a Box indirection is 100% the correct call here.

    Vec is a pointer (basically a Box) plus the size of the vector and its capacity, so Option<Vec<...>> will usually be three times the overhead, as it will allocate space for all of the pointer, the size and the capacity, of the (potentially nonexistent) vector. But at that point, we're arguing in units of bytes, so it's silly to debate size differences at that scale.

    (Vec also stores the memory allocator, but the default allocator Global is a zero-sized type)

    Since you know the size in advance, I'd go with Option<Box<[EventDetail; 11]>> rather than the vector, since the vector is just pointlessly pushing data to runtime that could be known at compile-time.