Description of what my function attempts to do
My function gets a string for example "Ab + abc EF++aG hi jkL" and turns it into ["abc", "hi"]
In addition, the function only takes into account letters and the letters all have to be lowercase.
The problem is that
char* str1 = "Ab + abc EF++aG hi jkL";
char* str2 = "This is a very famous quote";
char** tokens1 = get_tokens(str1);
printf("%s", tokens1[0]); <----- prints out "abc" correct output
char** tokens2 = get_tokens(str2);
printf("%s", tokens1[0]); <----- prints out "s" incorrect output
get_tokens function (Returns the 2d array)
char** get_tokens(const char* str) {
// implement me
int num_tokens = count_tokens(str);
char delim[] = " ";
int str_length = strlen(str);
char* new_str = malloc(str_length);
strcpy(new_str, str);
char* ptr = strtok(new_str, delim);
int index = 0;
char** array_2d = malloc(sizeof(char*) *num_tokens);
while (ptr != NULL){
if (check_string(ptr) == 0){
array_2d[index] = ptr;
index++;
}
ptr = strtok(NULL, delim);
}
free(new_str);
new_str = NULL;
free(ptr);
ptr = NULL;
return array_2d;
}
count_tokens function (returns the number of valid strings)
for example count_tokens("AB + abc EF++aG hi jkL") returns 2 because only "abc" and "hi" are valid
int count_tokens(const char* str) {
// implement me
//Seperate string using strtok
char delim[] = " ";
int str_length = strlen(str);
char* new_str = malloc(str_length);
strcpy(new_str, str);
char* ptr = strtok(new_str, delim);
int counter = 0;
while (ptr != NULL){
if (check_string(ptr) == 0){
counter++;
}
ptr = strtok(NULL, delim);
}
free(new_str);
return counter;
}
Lastly check_string() checks if a string is valid
For example check_string("Ab") is invalid because there is a A inside.
using strtok to split "Ab + abc EF++aG hi jkL" into separate parts
int check_string(char* str){
// 0 = false
// 1 = true
int invalid_chars = 0;
for (int i = 0; i<strlen(str); i++){
int char_int_val = (int) str[i];
if (!((char_int_val >= 97 && char_int_val <= 122))){
invalid_chars = 1;
}
}
return invalid_chars;
}
Any help would be much appreciated. Thank you for reading.
If you have any questions about how the code works please ask me. Also I'm new to stackoverflow, please tell me if I have to change something.
You have a few problems in your code. First I'll repeat what I've said in the comments:
strlen
does not include the NUL terminator in its length, so when you dochar* new_str = malloc(str_length);
strcpy(new_str, str);
new_str
overflows by 1 when strcpy
adds the '\0'
, invoking undefined behavior. You need to allocate one extra:
char* new_str = malloc(str_length + 1);
strcpy(new_str, str);
free
any pointer returned from strtok
. You only free
memory that's been dynamically allocated using malloc
and friends. strtok
does no such thing, so it's incorrect to free
the pointer it returns. Doing so also invokes UB.Your final problem is because of this:
// copy str to new_str, that's correct because strtok
// will manipulate the string you pass into it
strcpy(new_str, str);
// get the first token and allocate size for the number of tokens,
// so far so good (but you should check that malloc succeeded)
char* ptr = strtok(new_str, delim);
char** array_2d = malloc(sizeof(char*) *num_tokens);
while (ptr != NULL){
if (check_string(ptr) == 0){
// whoops, this is where the trouble starts ...
array_2d[index] = ptr;
index++;
}
// get the next token, this is correct
ptr = strtok(NULL, delim);
}
// ... because you free new_str
free(new_str);
ptr
is a pointer to some token in new_str
. As soon as you free(new_str)
, Any pointer pointing to that now-deallocated memory is invalid. You've loaded up array_2d
with pointers to memory that's no longer allocated. Trying to access those locations again invokes undefined behavior. There's two ways I can think of off the top to solve this:
new_str
, find the same tokens in str
(the string from main
) and point to those instead. Since those are defined in main
, they will exist for as long as the program exists.strcpy
the token into array_2d[index]
. I'll demonstrate this below:while (ptr != NULL){
if (check_string(ptr) == false)
{
// allocate (enough) memory for the pointer at index
array_2d[index] = malloc(strlen(ptr) + 1);
// you should _always_ check that malloc succeeds
if (array_2d[index] != NULL)
{
// _copy_ the string pointed to by ptr into our new space rather
// than simply assigning the pointer
strcpy(array_2d[index], ptr);
}
else { /* handle no mem error how you want */ }
index++;
}
ptr = strtok(NULL, delim);
}
// now we can safely free new_str without invalidating anything in array_2d
free(new_str);
I have a working demonstration here. Note some other changes in the demo:
#include <stdbool.h>
and used that instead of 0 and 1 int
s.get_tokens
function a bit to "return" the number of tokens. This is useful in main
for printing them out.freedPointer = NULL
lines.int
s to size_t
types for everything involving a size.One final note, while this is a valid implementation, it's probably doing a bit more work than it needs to. Rather than counting the number of tokens in a first pass, then retrieving them in a second pass, you can surely do everything you want in a single pass, but I'll leave that as an exercise to you if you're so inclined.