Search code examples
elasticsearch

Split values of one field in values of multiple fields in Elasticsearch


I imported osm data into an index of elasticsearch with gdal and ogr. Now a dataset looks like this:

            {
            "_index": "points",
            "_id": "ttZCjIwB1FY7TO4ET-AK",
            "_score": 6.103035,
            "_source": {
                "ogc_fid": 24862,
                "geometry": {
                    "type": "Point",
                    "coordinates": [
                        8.7037536,
                        48.8916509
                    ]
                },
                "osm_id": "3330289083",
                "name": "Amt für Umweltschutz",
                "other_tags": "\"addr:city\"=>\"Pforzheim\",\"addr:country\"=>\"DE\",\"addr:housenumber\"=>\"9\",\"addr:postcode\"=>\"75175\",\"addr:street\"=>\"Östliche Karl-Friedrich-Straße\",\"government\"=>\"environment\",\"office\"=>\"government\""
            }
        }

Due to the OSM data structure the address data have been imported into the field other_tags

to create easier elasticsearch queries, i would prefer an index structure like this:

            {
            "_index": "points",
            "_id": "ttZCjIwB1FY7TO4ET-AK",
            "_score": 6.103035,
            "_source": {
                "ogc_fid": 24862,
                "geometry": {
                    "type": "Point",
                    "coordinates": [
                        8.7037536,
                        48.8916509
                    ]
                },
                "osm_id": "3330289083",
                "name": "Amt für Umweltschutz",
                "city": "Pforzheim",
                "country": "DE",
                "housenumber": "9",
                "postcode": "75175",
                "street": "Östliche Karl-Friedrich-Straße"
            }
        }

i have read that it is possible to mutate data with logstash and there are many examples out there how a mutate filter function could look like. I was searching around for 4 hours, but i couldnt find how to use Logstash with its mutate filter.

Could anyone tell me exactly what the steps are to achieve my goal? Do i need to install something? Do i need to connect to the elasticsearch server and use some commands? Or is that possible by requesting the elasticsearch api?

Thanks


Solution

  • i found out that it is possible to mutate the index structure when reindexing the index, in my case this would be fine, too.

    POST {{url-elasticsearch}}/_reindex
    {
      "source": {
        "index": "points"
      },
      "dest": {
        "index": "points-wellformed"
      },
      "script": {
        "lang":"painless",
        "source": "if (ctx._source['other_tags'] != null) {def fieldSplit = ctx._source['other_tags'].splitOnToken(','); Map m = new HashMap(); for (item in fieldSplit) { def valueSplit = item.splitOnToken('=>'); if(valueSplit.length == 2) {m.put(valueSplit[0].replace('\"',''), valueSplit[1].replace('\"',''));}} for (entry in m.entrySet()){ctx._source[entry.getKey()] = entry.getValue(); }}"
    ,  
      }
    }
    

    I still appreciate solutions which may be faster and more elegant, if someone knows a better way.