Search code examples
databasemongodbpostgresqldatabase-designlarge-files

loading chunks of large text files from DB


So, as the topic says - i need to serve chunks of multiple large text files loaded in DB. The files would be anywhere from 100 lines to 10.000.000 lines. i would serve it to multiple instances of processing script in chunks of 10/50/100 lines. Question is - what is the proper way to store and serve them? assuming there could be hundreds of such files.

What DB and possible additional tools is the best choice? MongoDB + GridFS? or there is an efficient way to serve them in such small chunks in case if i store them as BLOB in PostgreSQL?


Solution

  • If you go for a relational database, I would store the files line by line in a table. That way it is easy to fetch lines:

    SELECT line FROM documents
    WHERE docname = 'mydoc'
      AND line_nr > 100
    ORDER BY line_nr
    FETCH FIRST 50 ROWS ONLY;
    

    A b-tree index on (docname, line_nr) would make the query very efficient.

    If you want to keep the table from getting too large, use range partitioning on docname.