Trying to tokenise using strtok the input file is
InputVector:0(0,3,4,2,40)
Trying to get the numbers in but I encountered something unexpected that I don't understand, my tokenising code looks like this.
#define INV_DELIM1 ":"
#define INV_DELIM2 "("
#define INV_DELIM3 ",)"
checkBuff = fgets(buff, sizeof(buff), (FILE*)file);
if(checkBuff == NULL)
{
printf("fgets failure\n");
return FALSE;
}
else if(buff[strlen(buff) - 1] != '\n')
{
printf("InputVector String too big or didn't end with a new line\n");
return FALSE;
}
else
{
buff[strlen(buff) - 1] = '\0';
}
token = strtok(buff, INV_DELIM1);
printf("token %s", token);
token = strtok(buff, INV_DELIM2);
printf("token %s", token);
while(token != NULL) {
token = strtok(NULL, INV_DELIM3);
printf("token %s\n", token);
if(token != NULL) {
number = strtol(token, &endptr, 10);
if((token == endptr || *endptr != '\0')) {
printf("A token is Not a number\n");
return FALSE;
}
else {
vector[i] = number;
i++;
}
}
}
output:
token InputVector
token 0
token 0
token 3
token 4
token 2
token 40
token
So the code first calls fgets and checks if it's not bigger than the length of my buffer if it isn't it replaces the last character with '\0'.
Then I tokenise the first word, and the number outside of the brackets. the while loop tokenises the numbers inside the brackets and change them using strtol and put it inside of an array. I'm trying to use strtol to detect if the data type inside of the brackets is numerical but it always detects error because strtok reads that last token which isn't in the input. How do i get rid of that last token from being read so that my strtol doesn't pick it up? Or is there a better way I can tokenise and check the values inside the brackets?
The input file will later on contain more than one input vectors and I have to be able to check if they're valid or not.
The most likely explanation is that your input line ends with the Windows newline sequence \r\n
. If your program runs on unix (or linux) and you are typing your input on Windows, Windows will send the two-character newline sequence but the Unix program won't know that it needs to do line-end translation. (If you ran the program diretly on the Windows system, the standard I/O library would deal with the newline sequence for you, by translating it to a single \n
, as long as you don't open the file in binary mode.)
Since \r
is not in your delimiter list, strtok
will treat it as an ordinary character, so your last field will consist of the \r
. Printing it out is not quite a no-op, but it's invisible, so it's easy to get fooled into thinking that an empty field is being printed. (The same would happen if the field consisted only of spaces.)
You could just add \r
to your delimiter list. Indeed, you could add both \n
and \r
to the delimiter list in your strtok
call, and then you wouldn't need to worry about trimming the input line. That will work because strtok
treats any sequence of delimiter characters as a single delimiter.
However, that may not really be what you want, since that will hide certain input errors. For example, if the input had two consecutive commas, strtok
would treat them as a single comma, and you would never know that the field was skipped. You could solve that particular problem by using strspn
instead of strtok
, but I personally think the better solution is to not use strtok
at all since strtol
will tell you where the line ends.
eg. (For simplicity, I left out printing of error messages. It's not necessary to check whether the line ends with a newline before this code; if you feel it necessary to do that check, you can do it after you find the close parenthesis at the end of the loop.):
#include <ctype.h> /* For 'isspace' */
#include <stdbool.h> /* For 'false' */
#include <stdlib.h> /* For 'strtol' */
#include <string.h> /* For 'strchr' */
// ...
char* token = strchr(buff, ':'); /* Find the colon */
if (token == NULL) return false; /* No colon */
++token; /* Character after the token */
char* endptr;
(void)strtol(token, &endptr, 10); /* Read and toss away a number */
if (endptr == token) return false; /* No number */
token = endptr; /* Character following number */
while (isspace(*token)) ++token; /* Skip spaces (maybe not necessary) */
if (*token != '(') return false; /* Wrong delimiter */
for (i = 0; i < n_vector; ++i) { /* Loop until vector is full or ')' is found */
++token;
vector[i] = strtol(token, &endptr, 10); /* Get another number */
if (endptr == token) return false; /* No number */
token = endptr; /* Character following number */
while (isspace(*token)) ++token; /* Skip spaces */
if (*token == ')') break; /* Found the close parenthesis */
if (*token != ',') return false; /* Not the right delimiter */
} /* Loop */
/* At this point, either we found the ')' or we read too many numbers */
if (*token != ')') return false; /* Too many numbers */
/* Could check to make sure the following characters are a newline sequence */
/* ... */
The code which calls strtol
to get a number and then check what the delimiter is should be refactored, but I wrote it out like that for simplicity. I would normally use a function which reads a number and returns the delimiter (as with getchar()
) or EOF if the end of the buffer is encountered. But it would depend on your precise needs.