Search code examples
ruby-on-railsruby-on-rails-4elasticsearchwysiwygtire

Rails 4: Clean Texts from Wysiwyg editors on ElasticSearch index


I have an index called offers im trying to perform a full text search with Elastic search, Im using the gem Tire.

My model has a description field, however the input of this field is a Wysiwyg editor, so when I check the indexed data on ElasticSearch index the description field has all the <p>'s the new lines \n and many other code characters like this:

<h2>Qu&eacute; hay en la caja:</h2>\r\n\r\n<ul>\r\n\t<li>Tablet KRONO 7021</li>\r\n\t<li>Cable USB</li>\r\n\t<li>

My question is: Do you think that the text needs to be decoded in ElasticSearch in order to do not affect full-text search?


Solution

  • You absolutely should decode your text. Two options:

    Save text as two different fields - one with WYSIWYG tags, and the other one clean and search against that column - problematic if you have A LOT of entries.

    Use Elastic's "char_filter": [ "html_strip" ] option. You will have to try it out manually to see how well it works in your case.