This is a simple question but I just can't seem to figure it out. I need a little help. I have a file called programFile that looks like this:
start mul val1
<--tab-->ldb #2
<--tab-->addr A B
<--tab-->float
loop lda #1
<--tab-->sta val2
<--tab-->j loop
val1 word 12
val2 word 0
I want to take only the lines that do not start with the tab and do something with them (for now I just want to print the first word in that line), so only the lines start mul val1
, loop lda #1
, val1 word 12
and val2 word 0
. The output should be:
start
loop
val1
val2
Eventually, I also want to take the lines that start with the tab and do something different with them, but this is my poor attempt at solving it:
while(ch = fgetc(programFile) != EOF){
if(ch == '\t'){
while(ch != '\n'){
ch = fgetc(programFile);
}
}else{
fscanf(programFile, "%s", symbol);
printf("%s\n", symbol);
}
}
And this is my output:
tart
ldb
addr
float
loop
sta
j
val1
val2
Continuing from the comment, while there is nothing wrong with using character-oriented input functions (e.g. fgetc
, getc
), when needing to handle "lines" of data, you are far better served to use the line-oriented input functions provided by the C-library (fgets
) or POSIX (getline
) and then parsing the needed information from each line.
Why? Basically, convenience and efficiency. Line-oriented input functions provide a buffered read (reading multiple-characters each time) and for large input files can really help with file I/O. Next, regardless of the contents, you will read the whole line (provided there is sufficient storage for line when using fgets
-- otherwise you will read multiple chunks until the complete line is read; getline
will automatically allocate (and reallocate) sufficient storage to hold each line).
You then have the tools like sscanf
, strtok
, strsep
, strstr
, strchr
, etc.. that you can parse whatever you need from the stored line. (you can also always use simple pointer arithmetic to parse any line with a pointer, or pair of pointers, "walking the string" and comparing each character as you go) In-memory operations on each character in a stored string are orders of magnitude faster than performing the same operation while doing file I/O on each at the same time.
When you are concerned about the beginning character of each line, you only need compare line[0]
(or simply *line
) against whatever character you are looking for.
The following is a simple example that reads from the input filename given as the first argument (or from stdin
, by default if no filename is given) and then tests the beginning character of each line. If the line begins with a tab
, it simply outputs the line preceded by a tab
(after having skipped the tab
from the file by outputting line + 1
) followed by -- begins with tab
(you can handle those lines however you like, or skip them entirely), otherwise it will output the line itself followed by -- no tab
. The handling of the differing prefixed lines is completely up to you. You can build an array of pointers holding each different type of line, or use a struct containing the commands and tabbed content in pointer arrays to preserve the line associations (which commands go with which tabbed lines), if required.
The only other note on line-oriented input functions is that they read up to and including the trailing '\n'
. You generally do not want to store strings with newlines
dangling off the ends, so you will want to trim the newlines
by overwriting the trailing '\n'
with the nul-terminating character. The example does this by getting the length of each line with strlen
and then overwriting the newline
with 0
(which is equivalent to the character '\0'
). I don't like to type...
#include <stdio.h>
#include <string.h>
#define MAX 64
int main (int argc, char **argv) {
char line[MAX] = "";
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (fgets (line, MAX, fp)) /* read each line in file */
{
size_t len = strlen (line); /* get the length */
if (line[len - 1] == '\n') /* check for trailing '\n' */
line[--len] = 0; /* overwrite with nul-byte */
if (*line == '\t') { /* if first char is '\t' */
printf ("\t%s - begins with tab\n", line + 1);
continue;
}
printf ("%s - no tab\n", line); /* line has no tab */
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
Input File
$ cat dat/tabfile.txt
start mul val1
ldb #2
addr A B
float
loop lda #1
sta val2
j loop
val1 word 12
val2 word 0
Example Use/Output
$ ./bin/filehandletab <dat/tabfile.txt
start mul val1 - no tab
ldb #2 - begins with tab
addr A B - begins with tab
float - begins with tab
loop lda #1 - no tab
sta val2 - begins with tab
j loop - begins with tab
val1 word 12 - no tab
val2 word 0 - no tab
As pointed out in the comments, if your intent was to parse the first word from the lines not beginning with a tab
, then you can simply use strchr
to locate the first space
, temporarily terminate line at the space
make use of the command, and then restore the space
so the string can be further parsed a second time, if required, e.g.
while (fgets (line, MAX, fp))
{
char *p = NULL;
size_t len = strlen (line);
...
if (*line == '\t') { /* if first char is '\t' */
printf ("\t%s - begins with tab\n", line + 1);
continue;
}
if ((p = strchr (line, ' '))) { /* find first ' ' */
*p = 0; /* terminate at p */
printf ("%s - no tab\n", line); /* output line */
*p = ' '; /* restore ' ' */
}
else
printf ("%s - no tab\n", line); /* s has no tab */
}
Or, writing the same termination of line, removing the if...else...
and the duplicate printf
, you could do the following in a bit more compact, but arguably less readable, code (completely up to you):
if ((p = strchr (line, ' '))) /* find first ' ' */
*p = 0; /* terminate at p */
printf ("%s - no tab\n", line); /* s has no tab */
if (p) /* if terminated */
*p = ' '; /* restore ' ' */
Example Use/Output
$ ./bin/filehandletab <dat/tabfile.txt
start - no tab
ldb #2 - begins with tab
addr A B - begins with tab
float - begins with tab
loop - no tab
sta val2 - begins with tab
j loop - begins with tab
val1 - no tab
val2 - no tab
Look things over and let me know if you have any further questions.