I have some JSON that looks like the following: Let's call that field metadata
{
"somekey1": "val1",
"someotherkey2": "val2",
"more_data": {
"contains_more": [
{
"foo": "val5",
"bar": "val6"
},
{
"foo": "val66",
"baz": "val44"
},
],
"even_more": {
"foz" : 1234,
}
}
}
This is just a simple example. The real one can grow even more complex. Keys can come up multiple times. Values as well and can be int or str.
Now the first problem is that I'm not quite sure how I have to correctly index this in elasticsearch so I can find something with specific requests.
I am using Django/Haystack where the index looks like this:
class FooIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
metadata = indexes.CharField(model_attr='get_metadata')
# and some more specific fields
And the template:
{
"foo": {{ object.foo }},
"metadata": {{ object.metadata}},
# and some more
}
The metadata will then be filled with the sample above and the result will look like this:
{
"foo": "someValue",
"metadata": {
"somekey1": "val1",
"someotherkey2": "val2",
"more_data": {
"contains_more": [
{
"foo": "val5",
"bar": "val6"
},
{
"foo": "val66",
"baz": "val44"
},
],
"even_more": {
"foz" : 1234,
}
}
},
}
Which will go into the 'text' column in elasticsearch.
So the goal is now to be able to search for things like:
The second problem: When I search e.g. for foo: val5 it matches all objects that just have the key "foo" and all objects that have the val5 somewhere else in it's structure.
This is how I search in Django:
self.searchqueryset.auto_query(self.cleaned_data['q'])
Sometimes the results are "okayish" sometime it's just completely useless.
I could need a pointer in the right direction and get to know the mistakes I made here. Thank you!
Edit: I added my final solution as an answer below!
It took a while to figure out the right solution that works for me
It was a mix of both the provided answers by @juliendangers and @Val and some more customizing.
Added custom get_type_mapping
method to the model
@classmethod
def get_type_mapping(cls):
return {
"properties": {
"somekey": {
"type": "<specific_type>",
"format": "<specific_format>",
},
"more_data": {
"type": "nested",
"include_in_parent": True,
"properties": {
"even_more": {
"type": "nested",
"include_in_parent": True,
}
/* and so on for each level you care about */
}
}
}
Added custom get_document
method to the model
@classmethod
def get_document(cls, obj):
return {
'somekey': obj.somekey,
'more_data': obj.more_data,
/* and so on */
}
Add custom Searchform
class Searchform(ElasticsearchForm):
q = forms.Charfield(required=False)
def get_index(self):
return 'your_index'
def get_type(self):
return 'your_model'
def prepare_query(self):
if not self.cleaned_data['q']:
q = "*"
else:
q = str(self.cleaned_data['q'])
return {
"query": {
"query_string": {
"query": q
}
}
}
def search(self):
esp = ElasticsearchProcessor(self.es)
esp.add_search(self.prepare_query, page=1, page_size=25, index=self.get_index(), doc_type=self.get_type())
responses = esp.search()
return responses[0]
So this is what worked for me and covers my usecases. Maybe it can be of some help for someone.