Search code examples
apache-kafkagoogle-bigquerygoogle-cloud-storagepinterest

secor ignores message.timestamp.input.pattern


I'm trying to load data from google cloud storage to big query using pinterest secor, but big query uses timestamps like "2019-08-16 15:30:00", while secor JsonMessageParser use long integer value (big query can use integer, but for some reason when trying to load integer timestamps to big query, it converts seconds to microseconds (adds 6 zeros) and then complains that timestamp is out of range)

Setting is secor:

# Name of field that contains a timestamp, as a date Format, for JSON. (2014-08-07, Jul 23 02:16:57 2005, etc...)
# Should be used when there is no timestamp in a Long format. Also ignore time zones.
message.timestamp.input.pattern=ts

Does nothing, without message.timestamp.name field it uses "1970-01-01" timestamp date. No matter what I set in message.timestamp.input.pattern it seems to be ignored.

Loading json data into big query with timestamp value like "2019-08-16 15:30:00" works, but I cannot make secor to recognize such values as timestamps.

Any idea how to fix that?


Solution

  • You can write your custom implementation of TimestampedMessageParser. Extend this class and make your custom class. Then over-ride the method extractPartitions(Message payload) and parse(Message message)

    In the 2nd method, get the byte array mPayload from the message object. This will have your json data in the form of bytes. Use any JSON Formatter to extract your timestamp. Then change that timestamp to whatever format you want. Then convert this json object back to byte array and then assign it to the payload byte array of the message object. In the 1st method, use this updated value to create the partitions as per your requirement.