How can I make Kibana graph by a substring or regex of a field?

I have an ElasticSearch instance with Kibana, holding a lot of user-level app data that I've accumulated over a few years. One of the fields is the Java version the user is running.

I'd like to graph Java versions over time, so I can have an idea whether it's reasonable to transition to a newer version. Unfortunately I can't find a way to aggregate 1.6.0_31, 1.6.0_32, 1.6.0_37, 1.6.0_51 as a single 1.6 entry, so the graph is nearly unreadable right now.

Is there a way in Kibana to aggregate the data, like a 'scripted field' that I could write a regex for? E.g. simplified_java: osjv % '\d\.\d' which would defined simplified_java as the part of the osjv field that matches a digit followed by a dot followed by a digit.

Currently it looks like Kibana only supports numeric scripted fields, which makes this hard. I'm not using LogStash, as I'm not really using 'logs', but rather a custom event reporting framework in my desktop application that (opt-in) reports usage statistics, so unfortunately I can't use any of its features.

I can manually do it, but I've already imported 2G of event data, and I'd hate to have to do it again, adding a new field just for what should be computable... :(

Is there a way to create a field based on a substring or regex in Kibana, or (failing that) a way to tell ElasticSearch to transparently do the same thing?

Solution

You can definitely do scripted fields in Kibana against string data in Elasticsearch, provided it is mapped as a keyword type. See the scripted field documentation for a tiny bit of info, and the scripted field blog post for better examples.

In short, you could do what you're looking for by building a scripted field that returns a substring:

def version = doc['osjv'].value; return (version != null) ? v.substring(0, v.lastIndexOf(".")-1) : version;

Keep in mind that there are performance implications with scripted fields since they run each time you view them.

A better approach could be to create a new field in your documents with the simplified_java value. You won't need to re-ingest all your data from source, but can instead do an Update By Query. Your query is just match_all{} and then you can define a script which creates the new field. So yes, there is indexing happening, but happening "in place":

POST your-index/_update_by_query
{
  "script": {
    "source": "def version = ctx._source.osjv; ctx._source.simplified_java = (version != null) ? version.substring(0, version.lastIndexOf(".")-1) : version",
    "lang": "painless"
  },
  "query": {
    "match_all": {}
  }
}

...haven't tested either of those scripts, but would look something like them!