i am developing a importer program for importing large text utf8 (character bytes are different) files in C#. if i load all the 20GB file to RAM, this solution is not suitable and possible. it's better to split file to multiple smaller files to process. Now, my problem is splitting the file foe processing. my solution is reading the file line by line and split them if the lines number is my suitable number. but i think, it is not fast solution to read the file line by line for splitting. splitting time is high. is there a algorithm for splitting large utf8 files to multiple files without reading line by line and faster.
My suggestions for your problem is as below. This I thought keeping in mind of separation of concern, as splitting of file and processing of file can be separated for better maintenance.