I imported osm data into an index of elasticsearch with gdal and ogr. Now a dataset looks like this:
{
"_index": "points",
"_id": "ttZCjIwB1FY7TO4ET-AK",
"_score": 6.103035,
"_source": {
"ogc_fid": 24862,
"geometry": {
"type": "Point",
"coordinates": [
8.7037536,
48.8916509
]
},
"osm_id": "3330289083",
"name": "Amt für Umweltschutz",
"other_tags": "\"addr:city\"=>\"Pforzheim\",\"addr:country\"=>\"DE\",\"addr:housenumber\"=>\"9\",\"addr:postcode\"=>\"75175\",\"addr:street\"=>\"Östliche Karl-Friedrich-Straße\",\"government\"=>\"environment\",\"office\"=>\"government\""
}
}
Due to the OSM data structure the address data have been imported into the field other_tags
to create easier elasticsearch queries, i would prefer an index structure like this:
{
"_index": "points",
"_id": "ttZCjIwB1FY7TO4ET-AK",
"_score": 6.103035,
"_source": {
"ogc_fid": 24862,
"geometry": {
"type": "Point",
"coordinates": [
8.7037536,
48.8916509
]
},
"osm_id": "3330289083",
"name": "Amt für Umweltschutz",
"city": "Pforzheim",
"country": "DE",
"housenumber": "9",
"postcode": "75175",
"street": "Östliche Karl-Friedrich-Straße"
}
}
i have read that it is possible to mutate data with logstash and there are many examples out there how a mutate filter function could look like. I was searching around for 4 hours, but i couldnt find how to use Logstash with its mutate filter.
Could anyone tell me exactly what the steps are to achieve my goal? Do i need to install something? Do i need to connect to the elasticsearch server and use some commands? Or is that possible by requesting the elasticsearch api?
Thanks
i found out that it is possible to mutate the index structure when reindexing the index, in my case this would be fine, too.
POST {{url-elasticsearch}}/_reindex
{
"source": {
"index": "points"
},
"dest": {
"index": "points-wellformed"
},
"script": {
"lang":"painless",
"source": "if (ctx._source['other_tags'] != null) {def fieldSplit = ctx._source['other_tags'].splitOnToken(','); Map m = new HashMap(); for (item in fieldSplit) { def valueSplit = item.splitOnToken('=>'); if(valueSplit.length == 2) {m.put(valueSplit[0].replace('\"',''), valueSplit[1].replace('\"',''));}} for (entry in m.entrySet()){ctx._source[entry.getKey()] = entry.getValue(); }}"
,
}
}
I still appreciate solutions which may be faster and more elegant, if someone knows a better way.