I have been going at this for hours, and I can't figure out what the problem in my code is. I'm currently writing a very simple assembler for a custom instruction set architecture. This assembler takes an input file and simply parses line by line. In the parsing process, I intend to split each line up by spaces, writing the tokens to an array for processing. Below is some of the code to do that:
char** tokens = (char**) malloc(sizeof(char*));
char* linecpy = strcpy(linecpy, line);
char* tok_ptr = strtok(linecpy, " ");
int tokenid = 0;
while(tok_ptr) {
tokens = (char**) realloc(tokens, (tokenid+1) * sizeof(char*));
tokens[tokenid] = tok_ptr;
tokenid++;
tok_ptr = strtok(NULL, " ");
}
To test that this is accurately working, I'm having it print out each token sequentially from the array, and I'm finding random splits in the middle of the tokens that shouldn't be there. Here is an example:
Line from assembly file:
jsr fibloop ; jump to the main program loop
Expected Output from splitting by spaces:
jsr
fibloop
;
jump
to
the
main
program
loop
Actual Output:
jsr
fibloo
p
;
jump
to
the
main
pr
ogram
l
oop
I've spent so long trying to solve this to no avail, and feedback on how to potentially solve this would be greatly appreciated
EDIT: Solution to this was pointed out by Clifford and 4386427, the problem was that linecpy had no memory allocated to it, and strcpy doesn't directly return a new string as I had incorrect assumed. The working code has been put below, and I've included a comment filter to stop tokenization after the parser hits a comment character, something pointed out by Clifford
char** tokens = (char**) malloc(sizeof(char*));
char* linecpy = malloc(strlen(line) + 1);
strcpy(linecpy, line);
char* tok_ptr = strtok(linecpy, " ");
int tokenid = 0;
while(tok_ptr) {
/*
If a token starts with a comment character then we stop tokenization,
as everything after will be commented and is of no use to the parser
*/
if(tok_ptr[0] == ';') break;
tokens = (char**) realloc(tokens, (tokenid+1) * sizeof(char*));
tokens[tokenid] = tok_ptr;
tokenid++;
tok_ptr = strtok(NULL, " ");
}
// free memory allocated to tokens after parsing
free(tokens);
Hopefully this helps anyone with the same problem I had, the quick responses given by members of this community was extremely helpful. Thanks guys!
char* linecpy = strcpy(linecpy, line);
is illegal. linecpy
has no allocated memory. You need
char* linecpy = malloc(strlen(line) + 1);
strcpy(linecpy, line);
Besides that:
char** tokens = (char**) malloc(sizeof(char*));
should be
char** tokens = NULL;