Search code examples
apache-kafkaksqldb

KSQL : How can I change separator (comma) of DELIMITED FORMAT?


I try to put a big number of messages (350M) to customer topic (source topic) with value format like this

10957402000||10965746672||2|2756561822|452048703649890|8984048701003649890

and then I make some streams and table on that topic, but the delimited format supported by ksql is just comma separator. I have some questions:

  • Is there any way to config ksql can understand my format? Or I have to convert to format default by ksql (comma separator)
  • From the original value from source topic like above, how this command can mapping value to table column? Or I have to convert format to json? CREATE STREAM (sub_id BIGINT, contract_id BIGINT, cust_id BIGINT, account_id BIGINT,telecom_service_id BIGINT, isdn BIGINT, imsi BIGINT) \ WITH (KAFKA_TOPIC='customer', VALUE_FORMAT='DELIMITED');

Thanks you.


Solution


  • Edit 26 February 2021 ksqlDB now supports configurable delimiters - use the VALUE_DELIMITER (or KEY_DELIMITER) configuration option. For example:

    CREATE STREAM (COL1 INT, COL2 VARCHAR) 
      WITH (KAFKA_TOPIC='test', VALUE_FORMAT='DELIMITED', VALUE_DELIMITER='TAB')
    

    Original answer:

    Currently KSQL only supports comma-separated for DELIMITED value format. So you'll need to use commas, or JSON, or Avro, for your source data.