Search code examples
node.jselasticsearchelasticsearch-pluginelasticsearch-2.0elasticsearch-mapping

How to index binary file in ElasticSearch without using Base64


I'm using the NodeJS elasticsearch package to interact with ElasticSearch. I have a document that has a file field. I want to be able to upload a file to the index but the only way that I have found is by using the elasticsearch-mapper-attachment plugin.

The problem is that if I use it, I have to load the whole file in memory, encode it to Base64 and then pass the String to ElasticSearch.

I'd like to be able to pass a Stream to ElasticSearch (referencing any binary file: pdf, xls, doc, ppt).


Solution

  • The elasticsearch-mapper-attachment plugin parses the uploaded binary file and extracts text for further indexing using built-in Tika extractor.

    What some applications do (for example Search Technology's Aspire) - they run binaries thru Tika locally, extract text and upload just that text with the documents to index.

    It might not be the answer you are looking for but you really have just two options - use Elastic plugin (and convert the binary to base64 in yoru code prior to uploading the document to elastic), or parse the binary and extract text in your code and then upload just that text to elastic. Former is easier, latter gives you more control over the process