So, as the topic says - i need to serve chunks of multiple large text files loaded in DB. The files would be anywhere from 100
lines to 10.000.000
lines. i would serve it to multiple instances of processing script in chunks of 10/50/100 lines. Question is - what is the proper way to store and serve them? assuming there could be hundreds of such files.
What DB and possible additional tools is the best choice? MongoDB + GridFS
? or there is an efficient way to serve them in such small chunks in case if i store them as BLOB
in PostgreSQL
?
If you go for a relational database, I would store the files line by line in a table. That way it is easy to fetch lines:
SELECT line FROM documents
WHERE docname = 'mydoc'
AND line_nr > 100
ORDER BY line_nr
FETCH FIRST 50 ROWS ONLY;
A b-tree index on (docname, line_nr)
would make the query very efficient.
If you want to keep the table from getting too large, use range partitioning on docname
.