Search code examples
cfgetslinesfgetc

Skipping specific lines and storing variables specific variables in C


This is a simple question but I just can't seem to figure it out. I need a little help. I have a file called programFile that looks like this:

start mul val1
<--tab-->ldb #2
<--tab-->addr A B
<--tab-->float
loop lda #1
<--tab-->sta val2
<--tab-->j loop
val1 word 12
val2 word 0

I want to take only the lines that do not start with the tab and do something with them (for now I just want to print the first word in that line), so only the lines start mul val1, loop lda #1, val1 word 12 and val2 word 0. The output should be:

start
loop
val1
val2

Eventually, I also want to take the lines that start with the tab and do something different with them, but this is my poor attempt at solving it:

while(ch = fgetc(programFile) != EOF){
    if(ch == '\t'){
        while(ch != '\n'){
            ch = fgetc(programFile);
        }
    }else{
        fscanf(programFile, "%s", symbol);
        printf("%s\n", symbol);
    }
}

And this is my output:

tart
ldb
addr
float
loop
sta
j
val1
val2

Solution

  • Continuing from the comment, while there is nothing wrong with using character-oriented input functions (e.g. fgetc, getc), when needing to handle "lines" of data, you are far better served to use the line-oriented input functions provided by the C-library (fgets) or POSIX (getline) and then parsing the needed information from each line.

    Why? Basically, convenience and efficiency. Line-oriented input functions provide a buffered read (reading multiple-characters each time) and for large input files can really help with file I/O. Next, regardless of the contents, you will read the whole line (provided there is sufficient storage for line when using fgets -- otherwise you will read multiple chunks until the complete line is read; getline will automatically allocate (and reallocate) sufficient storage to hold each line).

    You then have the tools like sscanf, strtok, strsep, strstr, strchr, etc.. that you can parse whatever you need from the stored line. (you can also always use simple pointer arithmetic to parse any line with a pointer, or pair of pointers, "walking the string" and comparing each character as you go) In-memory operations on each character in a stored string are orders of magnitude faster than performing the same operation while doing file I/O on each at the same time.

    When you are concerned about the beginning character of each line, you only need compare line[0] (or simply *line) against whatever character you are looking for.

    The following is a simple example that reads from the input filename given as the first argument (or from stdin, by default if no filename is given) and then tests the beginning character of each line. If the line begins with a tab, it simply outputs the line preceded by a tab (after having skipped the tab from the file by outputting line + 1) followed by -- begins with tab (you can handle those lines however you like, or skip them entirely), otherwise it will output the line itself followed by -- no tab. The handling of the differing prefixed lines is completely up to you. You can build an array of pointers holding each different type of line, or use a struct containing the commands and tabbed content in pointer arrays to preserve the line associations (which commands go with which tabbed lines), if required.

    The only other note on line-oriented input functions is that they read up to and including the trailing '\n'. You generally do not want to store strings with newlines dangling off the ends, so you will want to trim the newlines by overwriting the trailing '\n' with the nul-terminating character. The example does this by getting the length of each line with strlen and then overwriting the newline with 0 (which is equivalent to the character '\0'). I don't like to type...

    #include <stdio.h>
    #include <string.h>
    
    #define MAX 64
    
    int main (int argc, char **argv) {
    
        char line[MAX] = "";
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
        if (!fp) {  /* validate file open for reading */
            fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
            return 1;
        }
    
        while (fgets (line, MAX, fp))   /* read each line in file  */
        {
            size_t len = strlen (line); /* get the length */
    
            if (line[len - 1] == '\n')  /* check for trailing '\n' */
                line[--len] = 0;        /* overwrite with nul-byte */
    
            if (*line == '\t') {        /* if first char is '\t'   */
                printf ("\t%s - begins with tab\n", line + 1);
                continue;
            }
    
            printf ("%s - no tab\n", line);     /* line has no tab */
        }
    
        if (fp != stdin) fclose (fp);   /* close file if not stdin */
    
        return 0;
    }
    

    Input File

    $ cat dat/tabfile.txt
    start mul val1
            ldb #2
            addr A B
            float
    loop lda #1
            sta val2
            j loop
    val1 word 12
    val2 word 0
    

    Example Use/Output

    $ ./bin/filehandletab <dat/tabfile.txt
    start mul val1 - no tab
            ldb #2 - begins with tab
            addr A B - begins with tab
            float - begins with tab
    loop lda #1 - no tab
            sta val2 - begins with tab
            j loop - begins with tab
    val1 word 12 - no tab
    val2 word 0 - no tab
    

    As pointed out in the comments, if your intent was to parse the first word from the lines not beginning with a tab, then you can simply use strchr to locate the first space, temporarily terminate line at the space make use of the command, and then restore the space so the string can be further parsed a second time, if required, e.g.

    while (fgets (line, MAX, fp)) 
    {
        char *p = NULL;
        size_t len = strlen (line);
        ...
        if (*line == '\t') {        /* if first char is '\t'   */
            printf ("\t%s - begins with tab\n", line + 1);
            continue;
        }
    
        if ((p = strchr (line, ' '))) {      /* find first ' ' */
            *p = 0;                          /* terminate at p */
            printf ("%s - no tab\n", line);  /* output line    */
            *p = ' ';                        /* restore ' '    */
        }
        else
            printf ("%s - no tab\n", line);    /* s has no tab */
    }
    

    Or, writing the same termination of line, removing the if...else... and the duplicate printf, you could do the following in a bit more compact, but arguably less readable, code (completely up to you):

        if ((p = strchr (line, ' ')))        /* find first ' ' */
            *p = 0;                          /* terminate at p */
    
        printf ("%s - no tab\n", line);      /* s has no tab */
    
        if (p)                               /* if terminated  */
            *p = ' ';                        /* restore ' '    */
    

    Example Use/Output

    $ ./bin/filehandletab <dat/tabfile.txt
    start - no tab
            ldb #2 - begins with tab
            addr A B - begins with tab
            float - begins with tab
    loop - no tab
            sta val2 - begins with tab
            j loop - begins with tab
    val1 - no tab
    val2 - no tab
    

    Look things over and let me know if you have any further questions.