Search code examples
elasticsearchattachmenttire

How to prevent attachments from being stored in _source with Elasticsearch and Tire?


I've got some PDF attachments being indexed in Elasticsearch, using the Tire gem. It's all working great, but I'm going to have many GB of PDFs, and we will likely store the PDFs in S3 for access. Right now the base64-encoded PDFs are being stored in Elasticsearch _source, which will make the index huge. I want to have the attachments indexed, but not stored, and I haven't yet figured out the right incantation to put in Tire's "mapping" block to prevent it. The block is like this right now:

mapping do
  indexes :id, :type => 'integer'
  indexes :title
  indexes :last_update, :type => 'date'
  indexes :attachment, :type => 'attachment'
end

I've tried some variations like:

indexes :attachment, :type => 'attachment', :_source => { :enabled => false }

And it looks nice when I run the tire:import rake task, but it doesn't seem to make a difference. Does anyone know A) if this is possible? and B) how to do it?

Thanks in advance.


Solution

  • The _source field settings contain a list of fields what should be excluded from the source. I would guess that in case of tire, something like this should do it:

    mapping :_source => { :excludes => ['attachment'] } do
      indexes :id, :type => 'integer'
      indexes :title
      indexes :last_update, :type => 'date'
      indexes :attachment, :type => 'attachment'
    end