I was previously using the mapper-attachments plugin that is now deprecated, which was fairly easy to use along with normal indexing. Now that ingest-attachment has replaced it and requires a pipeline, etc. it has become confusing on how to properly use this.
Lets say I have a model named Media
, that has a file
field containing the base64 encoded file. I have the following mappings in that file:
mapping '_source' => { :excludes => ['file'] } do
indexes :id, type: :long, index: :not_analyzed
indexes :name, type: :text
indexes :visibility, type: :integer, index: :not_analyzed
indexes :created_at, type: :date, include_in_all: false
indexes :updated_at, type: :date, include_in_all: false
# attachment specific mappings
indexes 'attachment.title', type: :text, store: 'yes'
indexes 'attachment.author', type: :text, store: 'yes'
indexes 'attachment.name', type: :text, store: 'yes'
indexes 'attachment.date', type: :date, store: 'yes'
indexes 'attachment.content_type', type: :text, store: 'yes'
indexes 'attachment.content_length', type: :integer, store: 'yes'
indexes 'attachment.content', term_vector: 'with_positions_offsets', type: :text, store: 'yes'
end
I have created an attachment pipeline via curl:
curl -XPUT 'localhost:9200/_ingest/pipeline/attachment' -d'
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "file"
}
}
]
}'
Now, previously a simple Media.last.__elasticsearch__.index_document
would have been sufficient to index a record along with the actual file
via the mapper-attachments
plugin.
I'm not sure how to do this with ingest-attachment
using a pipeline and the elasticsearch-rails
gem.
I can do the following PUT via curl:
curl -XPUT 'localhost:9200/assets/media/68?pipeline=attachment' -d'
{ "file" : "my_really_long_encoded_file_string" }'
This will index the encoded file but obviously it doesn't index the rest of the model's data (or overwrites it completely if it was previously indexed). I don't really want to have to include every single model attribute along with the file in a curl command. Are there better or simpler ways of doing this? Am I just completely off with out pipelines and ingest are supposed to work?
Finally figured this out. I needed up to update the ES gems, specifically elasticsearch-api.
With the mappings and pipeline set as I have it, you can easily just do:
Media.last.__elasticsearch__.index_document pipeline: :attachment
or
Media.last.__elasticsearch__.update_document pipeline: :attachment
This will index everything correctly and your file will be properly parsed and indexed via the ingest pipeline.