Search code examples
cstringscanfdelimiter

Can I set non-alphabetic characters as a delimeter in c when using fscanf?


I'm trying to read strings from a file using

while(fscanf(fd, "%s ", word) != EOF) {}

Where fd is the file and word is where I'm storing the string. However, this effectively uses the whitespace as the delimeter. Currently, if I have a file that reads "this% is, the4 str%ng" it would result in strings "this%", "is,", "the4", and "str%ng". I need it to be "this" "is" "the" "str" "ng". Is it possible to do this with fscanf, or is there something else I need to use?

I saw some answers here and here but they didn't seem to help me out.


Solution

  • Those answers show the use of the "%[] format specifier. As an example suppose you have this to get two strings from the console:

    #include <stdio.h>
    
    int main(void){
        char s1[100] = "", s2[100] = "";
        int res;
    
        res = scanf("%99[^%]%%%99[^%]%%", s1, s2);
        printf("%d %s %s\n", res, s1, s2);
    }
    

    The first % starts the each format spec, the ^% tells scanf to stop at %, and the next "escaped" double % tells scanf to read the % that stopped the scan. It then repeats for the second string, so the format spec for one string is %99[^%]%% .

    To make the format look simpler, suppose the delimiter is not % but #, then the code would be:

    #include <stdio.h>
    
    int main(void){
        char s1[100] = "", s2[100] = "";
        int res;
    
        res = scanf("%99[^#]#%99[^#]#", s1, s2);
        printf("%d %s %s\n", res, s1, s2);
    }
    

    The function fscanf is similar.


    EDIT

    This answer does not handle "unknown" delimiters, so I modified the code.

    #include <stdio.h>
    
    int main(void){
        char s1[100] = "";
        while(scanf("%99[^!£$%&*()_-+={};:'@#~,.<>/?0123456789]", s1) == 1) {
            getchar();                      // remove the delimiter
            printf("%s\n", s1);
        }
    }
    

    Note I have not included characters ^ or " or [ or ] as delimiters.