Search code examples
c++performancefilelarge-filesrandom-access

What is the best way to read large file (>2GB) (Text file contains ethernet data) and access the data randomly by different parameters?


I have a text file which looks like below:

0.001 ETH Rx 1 1 0 B45678810000000000000000AF0000 555
0.002 ETH Rx 1 1 0 B45678810000000000000000AF 23
0.003 ETH Rx 1 1 0 B45678810000000000000000AF156500
0.004 ETH Rx 1 1 0 B45678810000000000000000AF00000000635254

I need a way to read this file and form a structure and send it to client application.

Currently, I can do this with the help of circular queue by Boost.

The need here is to access different data at different time.

Ex: If I want to access data at 0.03sec while I am currently at 100sec, how can I do this in a best way instead of having file pointer track, or saving whole file to a memory which causes performance bottleneck? (Considering I have a file of size 2 GB with the above kind of data)


Solution

  • Here is the solution I found:

    1. Used Circular buffers (Boost lock free Buffers) for parsing file and to save the structured format of line
    2. Used Separate threads:
      • One will continuously parse the file and push to lock free queue
      • One will continuously read from the buffer, process the line, form a structure and push to another queue
      • Whenever user needs random data, based on time, I will move the file pointer to particular line and read only the particular line.
    3. Both threads have mutex wait mechanisms to stop parsing once the predefined buffer limit reached
    4. User will get data at any time, and no need of storing the complete file contents. As and when the frame is read, I will be deleting the frame from queue. So file size doesn't matter. Parallel threads which fills the buffers allows to not spend time on reading file every time.
    5. If I want to move to other line, move file pointer, wipe off existing data, start threads again.

    Note: Only issue is now to move the file pointer to particular line. I need to parse line by line till I reach the point.

    If there exist any solution to move file pointer to required line it would be helpful. Binary search or any efficient search algorithm can be used and will get what I want.

    I appreciate if anybody gives solution for the above new issue!