Search code examples
mongodbelasticsearchelasticsearch-mongo-river

ElasticSearch river from Mongo messing up field mappings


I'm using Mongo, Elastic Search and this river plugin: https://github.com/richardwilly98/elasticsearch-river-mongodb

I have successfully set everything up in that the river keeps the ES data updated when Mongo is updated, but the river is straight up copying all the properties from the Mongo documents into ES, but I only want a small sub-set of those records. E.g. if a Mongo doc has 30 properties all of them are getting put into ES instead of only the 5 that I want. I assume the issue is with the mappings, and I've followed several docs and another Stack Overflow thread (curl -X POST -d @mapping.json + mapping not created) but it still is not working for me. Here is what I'm doing:

I'm creating my index with:

curl -XPOST "http://localhost:9200/mongoindex" -d @index.json

index.json:

{
  "settings" : {
      "number_of_shards" : 1
  },
  "analysis" : {
    "analyzer" : {
      "str_search_analyzer" : {
        "tokenizer" : "keyword",
        "filter" : ["lowercase"]
      },
      "str_index_analyzer" : {
         "tokenizer" : "keyword",
         "filter" : ["lowercase", "ngram"]
      }
    },
    "filter" : {
      "ngram" : {
        "type" : "ngram",
        "min_gram" : 2,
        "max_gram" : 20
      }
    }
  }
}

Then running:

curl -XPOST "http://localhost:9200/mongoindex/listing/_mapping" -d @mapping.json

With this data:

{
   "listing":{
      "properties":{
        "_all": {
          "enabled": false
        },
        "title": {
          "type": "string",
          "store": false,
          "index": "not_analyzed"
        },
        "bathrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "bedrooms": {
          "type": "integer",
          "store": true,
          "index": "analyzed"
        },
        "address": {
          "type": "nested",
          "include_in_parent": true,
          "store": true,
            "properties": {
              "counrty": {
                "type":"string"
              },
              "city": {
                "type":"string"
              },
              "stateOrProvince": {
                "type":"string"
              },
              "fullStreetAddress": {
                "type":"string"
              },
              "postalCode": {
                "type":"string"
              }
            }
        },
        "location": {
          "type": "geo_point",
          "full_name": "geometry.coordiantes",
          "store": true
        }
      }
   }
}

Then finally creating the river with:

curl -XPUT "http://localhost:9200/_river/mongoindex/_meta" -d @river.json

river.json:

{
  "type": "mongodb",
  "mongodb": {
    "db": "blueprint",
    "collection": "Listing",
    "options": {
      "secondary_read_preference": true,
      "drop_collection": true
    }
  },
  "index": {
    "name": "mongoindex",
    "type": "listing"
  }
}

After all that the river works in that ES is populated, but its a verbatim copy of Mongo right now, and I need to modify the mappings, but it just is not taking effect. What am I missing?

This is what my mapping looks like after the river runs.... nothing like what I want it to look like.

ES mapping

enter image description here


Solution

  • Turns out the issue was that the dynamic property was left out of the mappings config. It should be in 2 places, on the index.json as shown above, and in the mappings.json:

    {
       "listing":{
          "_source": {
            "enabled": false
          },
          "dynamic": false,      // <--- Need to add this
          "properties":{
            "_all": {
              "enabled": false
            },
            "title": {
              "type": "string",
              "store": false,
              "index": "str_index_analyzer"
            },
            "bathrooms": {
              "type": "integer",
              "store": true,
              "index": "analyzed"
            },
            "bedrooms": {
              "type": "integer",
              "store": true,
              "index": "analyzed"
            },
            "address": {
              "type": "nested",
              "include_in_parent": true,
              "store": true,
                "properties": {
                  "counrty": {
                    "type":"string",
                    "index": "str_index_analyzer"
                  },
                  "city": {
                    "type":"string",
                    "index": "str_index_analyzer"
                  },
                  "stateOrProvince": {
                    "type":"string",
                    "index": "str_index_analyzer"
                  },
                  "fullStreetAddress": {
                    "type":"string",
                    "index": "str_index_analyzer"
                  },
                  "postalCode": {
                    "type":"string"
                  }
                }
            },
            "location": {
              "type": "geo_point",
              "full_name": "geometry.coordiantes",
              "store": true
            }
          }
       }
    }
    

    The 902 docs vs 451, I think that is an bug in the ElasticSearch Head plugin I'm using to browse documents. It doesn't have duplicates, but a couple of spots show 902 docs as a summary of sorts.