Search code examples
rubyruby-on-rails-3pdffull-text-searchjrubyonrails

How do upload text document and is it possible to enable full text search on it?


I am using the following technologies:

  1. JRuby 1.7.4
  2. Rails 3.2.13
  3. Ubuntu 13.04
  4. DB2 C-Express
  5. Torquebox server 2.3.0

My goal is to make a simple controller which implements the following functions:

  1. Upload text files (MS Word format, Open Office or Libre Office formats)
  2. Perform a full text search on the uploaded files
  3. Display the text files in the browsers as PDFs

I have searched for gems that can help me to achieve that and have the following questions:

  1. What should be the column type of the field that is storing the text file. Generally, I supposed it should be binary type.
  2. Is it possible to perform full text search using Sunspot? As I have read, it seems to work with fields of type text, not binary.
  3. I read about two gems that allow me to generated PDFs. The Prawn gem, which adds more flexibility and the PDFkit which can generates PDFs from HTML pages. Can any of this be used to display the text file? I am supposing that I should first display somehow in HTML, and then use the PDF gem.

Has anyone done something like this and could you point me in the right direction?


Solution

  • I haven't ever done most of the things in your requirements, but I work quite heavily with a text parser that converts MS Word documents into XML documents. Perhaps I can at least get you started in the right direction for that.

    We use a Java library called POI, by Apache that makes the DOC -> XML conversion a simple process. Since you're using JRuby, I'd imagine it'll be much easier for you to integrate it into your project since we're using MRI Ruby. That was a PITA because we had to include lots of bridges and other junk just to be able to use the .jar files.

    Personally, I've used the Carrierwave gem to handle file uploading. It's a snap to upload files & attach them to models. You simply use the Carrierwave generator to generate an Uploader class that attaches to a field in a model, configure it to store & process the file based on your specifications, and PROFIT! The docs are great, but I'm happy to help you if you need it. If you need multi-file uploading, I explained in detail about how I accomplished it in a different SO post.

    Hope that helps!