Search code examples
elasticsearchelasticsearch-transform

Transforming in elasticsearch not update aggregated data


I am working on a scenario to aggregate daily data per user. The data processed realtime and stored in elasticsearch. Now I wanno use elasticsearch feature for aggregating data in real time.Iv'e read about Transfrom in elasticsearch and found this is the case we need. The problem is when the source index is updated, the destination index which is proposed to calculate aggregation is not updated. This is the case I have tested:

source_index data model:

{
 "my_datetime": "2021-06-26T08:50:59",
 "client_no": "1",
 "my_date": "2021-06-26",
 "amount": 1000
}

and the transform I defined:

PUT _transform/my_transform
{
  "source": {
    "index": "dest_index"
  },
  "pivot": {
    "group_by": {
      "client_no": {
        "terms": {
          "field": "client_no"
        }
      },
       "my_date": {
        "terms": {
          "field": "my_date"
        }
      }
    },
    "aggregations": {
      "sum_amount": {
        "sum": {
          "field": "amount"
        }
      },
      "count_amount": {
        "value_count": {
          "field": "amount"
        }
      }
    }
  },
  "description": "total amount sum per client",
  "dest": {
    "index": "my_analytic"
  },
  "frequency": "60s",
  "sync": {
    "time": {
      "field": "my_datetime",
      "delay": "10s"
    }
  }
}

Now when I add another document or update current documents in source index, destination index is not updated and not consider new documents. Also note that elasticsearch version I used is 7.13 I also changed date field to be timestamp(epoch format like 1624740659000) but still have the same problem.

What am I doing wrong here?


Solution

  • Could it be that your "my_datetime" is further in the past than the "delay": "10s" (plus the time of "frequency": "60s")?

    The docs for sync.field note:

    In general, it’s a good idea to use a field that contains the ingest timestamp. If you use a different field, you might need to set the delay such that it accounts for data transmission delays.

    You might just need a higher delay.