How to inject a new field into a generated message

I have a bunch of protocol buffers (v3) generated (legacy) messages. I can't change them (their protos). I use them both for transmitting data downstream as well as a data-structure within my application. For various reasons and uses, my application needs to know when it received those messages, i.e., a timestamp. The messages (and their submessages), however, do not include a field to store timestamps.

What I want is to take those messages on reception, inject a new 'timestamp' field (long or Timestamp, both are fine) into them, then pass them along inside my application, and, perhaps, onward to other applications downstream. I don't want to (and essentially can't) define a whole slew of 'shadow' (sub)message (backward compatibility etc.).

Protobuf's documentation says that I can do this. That it is straightforward even, and a 'good thing' about protobuf. No doubt.

But neither the documentation nor an hours long search on the internet, turned up a (Java) code example of, what I believe, should be a basic use-case (of DynamicMessage?). Or even just a basic step-by-step explanation of what you'd need to do, using what. I think found some pieces of puzzle, but lack too many others to make it stick.

I've been trying to something like this:

public static <T extends Message> T injectTimestamp(Instant timestamp, T message) {
   // 1. Determine highest/free index number from message
   // 2. Extract message descriptor (builder?)
   // 3. Create new timestamp field using FieldDescriptor builder (?)
   // 4. Add new timestamp field to extracted message descriptor
   // 5. Build new message (type) (with timestamp) from old message using new descriptor
   // 6. Set new timestamp field with timestamp
   // 7. Build new message and return (yay!)
}

I can (kinda) do 1, 3, and 6; but the rest (still) eludes me.

I have written a (Java) wrapper to do this for me within my application (TimestampedContainer<T extends Message>), but that doesn't solve the problem when I want/need to send these messages to applications further down the software-stack.

Clarification
From the comments below, it seems my description of the problem is somewhat unclear. Sorry about that. Perhaps a clarification will help.

My application is somewhere in the middle of a data processing pipeline. It takes protobuf messages up-the-wire, processes and updates the data contained within, and passes them along to other applications down-the-wire.

The message schemas in the .proto files are defined up-the-wire, and out of my control; i.e., I can't revise them. The applications down-the-wire rely on those schemas in their business logic as well, so I can't wrap the received messages in new messages either, as in:

message MessageB {
   long timestamp = 1;
   MessageA orig_message = 2;
}

What I would like my application to do is to dynamically, at runtime, use protobuf's reflection and message generating methods/classes, and: unpack the schema from the received messages; add a timestamp field to the schema; create a new message (type) from that; copy the data from the original message into it; add the timestamp data; and pass it along (either within the application, or down-the-wire).

For example, going from:

message MessageA {
   int32 data_int = 1;
   // lots more data fields ...
   string data_string = 31;
}

message MessageA {
   int32 data_int = 1;
   // lots more data fields ...
   string data_string = 31;
   long timestamp = 32;
}

Whether the application down-the-wire use the newly added timestamp fields is entirely up to them.

In a way, I want to do a bit like what protoc does, at runtime, dynamically.

I think this should be possible; after all, the protobuf documentation says:

Updating A Message Type
If an existing message type no longer meets all your needs – for example, you’d like the message format to have an extra field – but you’d still like to use code created with the old format, don’t worry! It’s very simple to update message types without breaking any of your existing code when you use the binary wire format.

Solution

Let's take this step by step, see if I've got the problem understood.

"What I want is to take those messages on reception,"

Ok, I'm assuming that means receive a whole demarcated message somehow, and you are parsing it using code generated from the original schema for the message. You now have an object in memory, of a class generated by the protoc compiler from the schema.

inject a new 'timestamp' field (long or Timestamp, both are fine) into them,

There is no field in the object that you have in memory into which a timestamp value can be written. The options here are

Write another class that incorporates the original message class and a timestamp field. Create a new object from that, setting the message and timestamp field. Or,
Create such a class with a new message in the protoc schema

The latter would look something like the following, in your schema:

message TimestampedMessage
{
    long timestamp = 1;
    YourOriginalMessage msg = 2;
}

So far, both amount to pretty much the same thing.

then pass them along inside my application,

That sounds like simply passing the reference to the object combining the message and timestamp around inside the application

and, perhaps, onwards to other applications downstream.

That would best be accomplished had the extended class been created by a adding a new message definition to the protoc schema. This is because protoc has built you a serialisable class, other projects can consume them, etc.

I don't want to (and essentially can't) define a whole slew of 'shadow' (sub)message (backward compatibility etc.).

And this is where I get stuck, and wonder what the real problem is. If there is a requirement to add information to a message and propagate it further throughout your system, you have to have a way of communicating the new information.

If the problem is that there's a lot of original messages, and you don't want to double the message count simply for the sake of a timestamp field, an option is a message that has a oneof field of , and a timestamp / long field for the timestamp. This means only one new message type, no transcribing of fields from one object to another field by field, and very little re-work in your application and throughout the system.

message TimestampedMessage {
  long timestamp = 1;
  oneof msg {
    // messages from your original schema
    YourOriginalMessageType1 type1 = 2;
    YourOriginalMessageType2 type2 = 3;
  }
}

This is even extensible - add more to the oneof. Importantly you're not modifying YourOriginalMessageType1, etc.

Apologies if this has gone down the wrong alley; let us know, see what we can do :-)

In General

One of the reasons to use something like Google Protocol Buffers it to facilitate agility in what information looks like within a system throughout development. If a bunch of messages had been defined without a field that, now, one wishes they all had, and one already had valuable data, then something like what I've outlined is a reasonable way to proceed.

Your code snippet feels like you're trying to programmatically do the work of protoc to some extent. The issue with it is, how on earth do you communicate the consequences / results to another part of the system? If you're dynamically creating new message formats without a schema, how does the rest of the system know what to do?

EDIT

If You Have Some Control

If you have some control of or ability to come to an agreement with the down stream consumers, you could pursue your own schema additions independent of the upstream. You simply have your own schema file that includes the original, and formulates a new message (probably a oneof idea as above). That way, the only thing you're taking control of is how timestamp is communicated downstream alongside the original message. If the controllers of the original schema want to update their message definitions, so be it; they can do so, but they won't be altering how you communicate timestamp.

It could get confusing if they add their own timestamp field, so there's still need for collaboration and agreement!

This isn't so far removed from what you're planning on doing anyway, though the way you're trying to do it feels like if you were agreeing it in a schema, you'd be appending timestamp to the end of each and every message.

EDIT 2 - Resolved

The resolution of this question is that, when one dynamically adds a field to a message object using the API GPB provides in Java (and C++?), that does not result in a description of the new field being wrapped up in the resultant wireformat data, propagated to and acted on automatically by a recipient of it (regardless of programming language). The description does not travel with the message data. This was the point of misunderstanding.

Thus the course of action originally being pursued by our O.P. BarthCrane would have resulted in recipients getting message data containing tags and values they'd not be able to parse or process.

Dynamic modification of the a message object within a program is no different in concept to editing the schema; every emitter / recipient of messages has to know to make the same dynamic changes, if they're going to be able to understand the modified messages, just as they'd otherwise have to be using code built from the same schema.

There is the concept of self describing messages, but you still have to have a starting schema that supports that and recipients that know how to programmatically build a parser for them from the description received.

Self-modifying code: there be dragons, approach with caution!