I am planning to create a sample application in .NET WPF which reads MAINFRAME bin files, compare files side by side and extract file content without losing any formatting.
Below questions would really be helpful for me to create an algorithm.
- What were general encoding standards of a typical MAINFRAME bin file?
- How we read a bin file, do we need to use any tool?
- How do we find differences between two MAINFRAME bin files?
- Will we be able to identify a MAINFRAME bin file without using any tool – meaning from file itself?
- What kind of output files are generated/read by MAINFRAME , meaning is it restricted to generate only “.bin” file or any other extension as well?
Please also provide me any open source editor which does the reading/writing/editing of a MAINFRAME bin file which can be a starter.
Record editor is one of the application which I use for some of my bin files although it seems to be incompatible for some bin files.
There is no such thing as a "MAINFRAME bin file" - mainframe datasets neither have the concept of filetypes nor of file-extensions. After transfering files from the mainframe to some Windows/Unix-machine some people use the .bin
-extension to indicate that the content should be treated as binary.
Some things you should know about mainframe-files (I'm mostly talking of fixed-recordlength sequential datasets, the most basic kind of dataset):
- There is no common file-format, usually each mainframe-program rolls its own record-descriptions and writes those to a file. To make sense of the file you have to know the record-description.
- You can read those files with any tool that is able to process an arbitrary sequence of bytes - but again: to make use of it you must interpret those bytes somehow.
- Mainframe-datasets don't have some newline-character(s) to indicate the end of one data-record and the beginning of the next. Most datasets used for transfer have a fixed number of bytes per record, so the reading program has to split the read bytes at appropriate places. (There are also varying-recordlength-files, but I'll skip these for now).
- Character-data on mainframes is usually encoded using some EBCDIC-characterset opposed to ASCII as used on PC-platforms. So you'll have to convert those for PC-based processing
- Other sections might contain packed-decimal numeric data - each digit occupies a half byte ("nibble") and the last nibble indicates the sign of the number. E.g.
-4387
might be stored in 3 bytes as 04 38 7D
.
- Your files might also contain raw binary or floating-point data - we can't tell, you'll have to ask the creator of the file.
To sum it up:
- you can't recognize or process them with some out-of-the-box-tool
- you have to know the record-description to do anything useful
- you'll have to cope with lots of mainframe-specific stuff
For most usecases, transferring raw binary files from mainframe to PC is not feasible. Better create EBCDIC-text-based files on the mainframe, use your filetransfer-tool to convert them to ASCII while transferring and process the text-data.