I am trying to set up Xcode to get rid of non-human readable characters in legacy text files recovered from 8” floppy disks created in 1986. The files were created in QDOS, a proprietary disk operating system using a text-based Music Composition Language application aka MCL.
I aim to write a C program to read the ascii file, character by character, filter out non-printable characters from the source file and save it to a destination file thereby making it possible to view file contents in exactly the same format a composer would have seen it in 1986.
When Xcode reads the legacy text file, the unwanted character appears as the first human readable character of every line except the first line.
!B=24:Af
* BAR 1
G2,6
* BAR 2 & 3
!G2,1/4:Bf2,1/4:C2,1/4:Ef2,1/4:F3,1/4:G3,35/4:D3:A4
"* BAR 4
#Bf4:G4,2:D3:A4:Bf4
$* BAR 5
%D4,2:C4,3:F5
&* BAR 6
'D4:Bf4:A4,2:G4:D3:?
(* BAR 7 &
A hex dump of the above text file shows the two ascii bytes $0D
(Carriage Return) followed by $1C
(File Separator). These two bytes plus the byte that follows immediately after them, are the characters I am trying to remove.
0000: 1C 1D 21 42 3D 32 34 3A 41 66 0A 1C 1E 2A 20 20 ¿¿!B=24:Af¬¿¿*
0010: 20 20 20 20 20 20 20 20 20 42 41 52 20 31 0A 1C BAR 1¬¿
0020: 1F 47 32 2C 36 0A 1C 20 2A 20 20 20 20 20 20 20 ¿G2,6¬¿ *
0030: 20 20 20 20 42 41 52 20 32 20 26 20 33 0A 1C 21 BAR 2 & 3¬¿!
0040: 47 32 2C 31 2F 34 3A 42 66 32 2C 31 2F 34 3A 43 G2,1/4:Bf2,1/4:C
0050: 32 2C 31 2F 34 3A 45 66 32 2C 31 2F 34 3A 46 33 2,1/4:Ef2,1/4:F3
0060: 2C 31 2F 34 3A 47 33 2C 33 35 2F 34 3A 44 33 3A ,1/4:G3,35/4:D3:
0070: 41 34 0A 1C 22 2A 20 20 20 20 20 20 20 20 20 20 A4¬¿"*
0080: 20 42 41 52 20 34 20 0A 1C 23 42 66 34 3A 47 34 BAR 4 ¬¿#Bf4:G4
0090: 2C 32 3A 44 33 3A 41 34 3A 42 66 34 0A 1C 24 2A ,2:D3:A4:Bf4¬¿$*
00A0: 20 20 20 20 20 20 20 20 20 20 20 42 41 52 20 35 BAR 5
00B0: 0A 1C 25 44 34 2C 32 3A 43 34 2C 33 3A 46 35 0A ¬¿%D4,2:C4,3:F5¬
00C0: 1C 26 2A 20 20 20 20 20 20 20 20 20 20 20 42 41 ¿&* BA
00D0: 52 20 36 0A 1C 27 44 34 3A 42 66 34 3A 41 34 2C R 6¬¿'D4:Bf4:A4,
00E0: 32 3A 47 34 3A 44 33 3A 3F 0A 1C 28 2A 20 20 20 2:G4:D3:?¬¿(*
00F0: 20 20 20 20 20 20 20 20 42 41 52 20 37 20 26 20 BAR 7 &
I created an Xcode Command Line Tool
Project. When I select Type : Plain Text
and Text Encoding : Unicode (UTF-8)
in the Xcode Inspectors Window
the same single printable character is visible. I chose those settings because my MacOS expects en_AU.UTF-8
.
The C code that follows makes an identical copy of the text file without identifying individual characters. Essentially it will read old file contents and write a new file successfully. The hex dump for the output file is identical to the hex dump above.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char * argv[]) {
char filename[] = {"~/Desktop/MCLRead/bell1.ss"} ;
printf("MCLRead\n\t%s\n", filename);
FILE* fin = fopen(filename, "r");
if (!fin) { perror("input error"); return 0; }
FILE* fout = fopen("output.txt", "w");
if (!fout) { perror("fout error"); return 0; }
fseek(fin, 0, SEEK_END); // go to the end of file
size_t filesize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the beginning
//allocate enough memory
char* buffer = malloc(filesize * sizeof(char));
//read one character at a time (or `fread` the whole file)
size_t i = 0;
while (1)
{
int c = fgetc(fin);
if (c == EOF) break;
//save to buffer
buffer[i++] = (char)c;
}
However when I compile, build and run this in Xcode the characters are unrecognisable regardless of the Type
or Text Encoding
settings in the Xcode Inspectors Window
. The following error message appears in the Console Window
error: No such file or directory
Program ended with exit code: 0
When I run the same code in the Terminal Window
it generates an output text file but the characters are unrecognisable
Desktop % gcc main.c
Desktop % ./a.out output.txt
Desktop % cat output.txt
cat
results in a string of 128 ?
characters in the Terminal Command Line
- a total of 128 even though the file contains more than a thousand characters in total.
Can someone give me any clues for making this text file readable in a format that allows the non-human-readable characters to be stripped from the start of each line.
Please note, I am not asking for help to write the C code but rather what Text Format will make the unwanted 8-bit characters readable so I can remove them (a slight refinement on the question I asked initially). Any further help would be most appreciated. Thanks in advance.
Note
This post has been revised in response to comments.
The hex dump has been done as text rather than as an image. This offers the most reliable way to share the text file for anyone who wants to test what I have done
The problem can be solved easily by reading each byte as a 7-bit binary value using int
not char
. Source file is read in hex, saved in decimal and read as text.
Note. There is no EOF character. MCL used the word 'END' at the end of the file. Because it has been salvaged from a floppy disk image, the file sometimes has a trailing string of hex E5
characters written on the floppy disk when it was formatted. At other times where the format track is already overwritten the file has a trailing string of zeros.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CR 0x0D // ASCII Carriage Return
#define FS 0x1C // ASCII File Separator
#define FD_FORMAT 0xE5 // floppy disk format track
int main(int argc, const char * argv[])
{
char fname[20];
printf("\n Enter MCL file name : ");
scanf("%s", fname);
printf("\n\t%s\n", fname);
int a = 0; // init CR holder
int b = a; // init File Separator holder
FILE* fin = fopen(fname, "r"); // init read
if (!fin)
{ perror("input error"); return 0;
}
FILE* fout = fopen("output.txt", "w"); // init write
if (!fout)
{ perror("fout error"); return 0;
}
fseek(fin, 0, SEEK_END); // look for end of file
size_t fsize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the start
int* buffer = malloc(fsize * sizeof(int)); // allocate buffer
size_t i = 0;
while (1)
{
int c = fgetc(fin); // read one byte at a time
if (c < CR) break; // skip low control codes
if (c == FD_FORMAT) break; // skip floppy format track
printf("\t%X", a);
printf("\t%X", b);
if ((a != CR) && (b != FS)) // skip save if new line
{
printf("\t%0X\n", c);
buffer[i++] = c; // save to buffer
}
a = b;
b = c;
}
for (i = 0; i < fsize; i++) // write out int by int
fputc(buffer[i], fout);
free(buffer);
fclose(fin);
fclose(fout);
return 0;
}