Search code examples
jsonmongodbetlkettleavro

Date Field Schema in Avro Input Kettle


I'm using Pentaho Data Integration (Kettle) for an ETL process, extracting from a MongoDB source.

My source has an ISODateField so the JSON returned from the extraction is like:

{ "_id" : { "$oid" : "533a0180e4b026f66594a13b"} , "fac_fecha" : { "$date" : "2014-04-01T00:00:00.760Z"} , "fac_fedlogin" : "KAYAK"}

So now, I have to unserialize this JSON with an AVRO Input. So I've defined the AVRO schema like

{
  "type": "record",
  "name": "xml_feeds",
  "fields": [
      {"name": "fac_fedlogin", "type": "string"},
      {"name": "fac_empcod", "type": "string"},
      {"name": "fac_fecha", "type": "string"}
  ]
}

It would be ok that fac_fecha could be a date type but AVRO doesn't support this.

In execution time, AVRO Input rejects all rows as they have an error. This only ocurrs when I use the date field.

Any suggestions of how can I do this?

Kettle version: 4.4.0 Pentaho-big-data-plugin: 1.3.0


Solution

  • The easiest solution I found for this problem was uprading The Pentaho Big Data Plugin to a newer version 1.3.3

    With this new version expliciting the schema for the mongodb Input json is avoided. So the Final solution is shown as following:

    global view: enter image description here

    And inside MongoDB Input:

    enter image description here

    The schema is decided automatically and it can me modified.