Search code examples
elasticsearchkibanakibana-7

Kibana Tophits on transform group by a field not all field


So I have this case where I need to use top hits on transformation I want to show data based on

I have this data

email      col2      col3     col4  col5    Time
a.com         a        a        a    a     11:00 
a.com         a        a        a    a     11:01 
a.com         a        b        a    a     11:02

I want to remove the duplicate email, and only show it based on the latest time. I'm using transform and aggregate it based on max time. and for the group by I choose every field I needed. It returns data such as : I transform the index and make it groupby : email, col2,col3,col4 and aggregate it by max(Time)

Current index

email      col2      col3     col4  col5    Time
a.com         a        a        a    a     11:01 
a.com         a        b        a    a     11:02

I only want it to show data my target

email      col2      col3     col4  col5    Time
a.com         a        b        a    a     11:02

How can I make the transform based on groupby email only instead every field? Since I need all the field but I don't think add all of the as group by is right but there are only 2 methods either aggregation or groupby

my transformation definition : It didn't come as what i need

{
  "id": "transform_baru",
  "source": {
    "index": [
      "email-profile-nov-bug*"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "transform_baru"
  },
  "pivot": {
    "group_by": {
      "Email.keyword": {
        "terms": {
          "field": "Email.keyword"
        }
      },
      "fa.keyword": {
        "terms": {
          "field": "fa.keyword"
        }
      },
      "ever.keyword": {
        "terms": {
          "field": "ever.keyword"
        }
      },
      "bln.keyword": {
        "terms": {
          "field": "bln.keyword"
        }
      },
      "domain.keyword": {
        "terms": {
          "field": "domain.keyword"
        }
      },
      "Email_age_category.keyword": {
        "terms": {
          "field": "Email_age_category.keyword"
        }
      },
      "Status_Category.keyword": {
        "terms": {
          "field": "Status_Category.keyword"
        }
      },
      "Vintage_cat.keyword": {
        "terms": {
          "field": "Vintage_cat.keyword"
        }
      }
    },
    "aggregations": {
      "extract_date.max": {
        "max": {
          "field": "extract_date"
        }
      }
    }
  },
  "settings": {},
  "version": "7.8.0",
  "create_time": 1607832008196
}

Solution

  • Problem solved by using this Tophit workaround But I wasn't able to use it. Here is how to use it:

    1. Choose only the groupby you need. In my case I would just add Email
    2. Edit json config and add the aggregation with the latest_doc script
    3. Change '@timestamp' field as your time field.
    4. So technically, you only use email as groupby, and latest_doc as aggregation
    5. On the preview, it might be show only the field that you choose as group by, but when the transform index created rest of the field will be show under latest.doc script. So don't worry and just create the transform

    I hope this will help some elastic newbie to use this workaround.

    Thank you for everyone who try to help me. Cheers