I have multiple files coming from Mainframe systems, basically EBCDIC data. Now some of these files have data from multiple modules, appended in one single file, for example, lets say I have a file CISA, which has data from multiple sub-modules. Now all these modules have row length of 1000 bytes but have different data structure. So to read these files I need to use different layout and to do that I need to split the parent file into multiple files based on a key value specified at a given location, lets say byte range 20-23. For first row, 20-23 byte range value maybe 0001 and for next row 0002, so I need to split this file into multiple file based on value of byte range.
In my current implementation using C#, what I have done is that read the data using byte stream and then read one row at a time. I've used a Data table with two columns, first column stores filename, generated based on the byte range (20-23) value, second column stores the Byte stream which I just read.
I keep doing this so once the entire file is read, I have a data table, which gives me a list of file names and byte stream for those file. I loop through the data table and write each row based on the file name stored in the column name.
This solution is working all right but the performance in really slow because of the high I/O in writing the data table. So is there an option with which I can skip writing the data for each row and instead save the entire partition in one shot.
Firstly, I'd completely forget about DataTable
here - that seems a terrible idea. How big are the files? if they're small: just load all the data (File.ReadAllBytes
) and use an ArraySegment<byte>
for each (maybe a List<ArraySegment<byte>>
) - or if you're OK using preview bits: this would be a great use of Span<byte>
(similar to ArraySegment<byte>
, but more ... just more).
If the file is large, I'd look at MemoryMappedFile
here; seems a great fit.