Search code examples
c++splitc-stringsstrtokstring-literals

How to use strtok on char*


In c++, to filter out the delimiter using strtok, the source has to be a char array, otherwise, it gives me a seg fault. How can I use strtok on a pointer to char?

Code example of how to structure strtok:

#include <stdio.h>
#include <string.h>

int main () {
  char str[] ="- This, a sample string."; // this is the string i want to split. notice how it's an array
  char * pch;
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

Example of what I want to do:

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char* str ="- This, a sample string."; // since this is a pointer to char, it gives a segmentation fault after compiling, and executing.
  char * pch;
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

Solution

  • You are trying to modify a string literal (the function strtok changes the source string inserting null characters '\0')

    char* str ="- This, a sample string.";
    

    First of all in C++ opposite to C string literals have types of constant character arrays. So you have to write the declaration of the pointer in a C++ program with the qualifier const.

    const char* str ="- This, a sample string.";
    

    Any attempt to change a string literal in C and C++ results in undefined behavior.

    For example in the C Standard there is written (6.4.5 String literals)

    7 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

    So it is always better also in C to declare pointers to string literals with the qualifier const.

    Instead of strtok you could use for example C standard string function strspn and strcspn.

    Here is a demonstration program.

    #include <iostream>
    #include <iomanip>
    #include <string_view>
    #include <cstring>
    
    int main()
    {
        const char *s = "- This, a sample string.";
        const char *delim = " ., -";
    
        for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
        {
            auto n = strcspn( p, delim );
    
            std::string_view sv( p, n );
    
            std::cout << std::quoted( sv ) << ' ';
    
            p += n;
        }
    
        std::cout << '\n';
    }
    

    The program output is

    "This" "a" "sample" "string"
    

    You could for example declare a vector of string views like std::vector<std::string_view> and store in it each substring.

    For example

    #include <iostream>
    #include <iomanip>
    #include <string_view>
    #include <vector>
    #include <cstring>
    
    int main()
    {
        const char *s = "- This, a sample string.";
        const char *delim = " ., -";
    
        std::vector<std::string_view> v;
    
        for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
        {
            auto n = strcspn( p, delim );
    
            v.emplace_back( p, n );
    
            p += n;
        }
    
        for (auto sv : v)
        {
            std::cout << std::quoted( sv ) << ' ';
        }
        std::cout << '\n';
    }
    

    The program output is the same as shown above.

    Or if the compiler does not support C++ 17 then instead of a vector of the type std::vector<std::string_view> you can use a vector of the type std::vector<std::pair<const char *, size_t>>.

    For example

    #include <iostream>
    #include <iomanip>
    #include <utility>
    #include <vector>
    #include <cstring>
    
    int main()
    {
        const char *s = "- This, a sample string.";
        const char *delim = " ., -";
    
        std::vector<std::pair<const char *, size_t>> v;
    
        for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
        {
            auto n = strcspn( p, delim );
    
            v.emplace_back( p, n );
    
            p += n;
        }
    
        for (auto p : v)
        {
            std::cout.write( p.first, p.second ) << ' ';
        }
        std::cout << '\n';
    }
    

    The program output is

    This a sample string
    

    Or you could use a vector of objects of the type std::string: std::vector<std::string>.

    In C you can use a variable length array or a dynamically allocated array with the element type of a structure type that contains two data members of the type const char * and size_t similarly to the C++ class std::pair. But To define the array you at first need to calculate how many words there are in the string literal using the same for loop.

    Here is a C demonstration program.

    #include <stdio.h>
    #include <string.h>
    
    int main( void )
    {
        const char *s = "- This, a sample string.";
        const char *delim = " ., -";
    
        size_t nmemb = 0;
    
        for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
        {
            ++nmemb;
            size_t n = strcspn( p, delim );
            p += n;
        }    
    
        struct SubString
        {
            const char *pos;
            size_t size;
        } a[nmemb];
    
        size_t i = 0;
    
        for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
        {
            size_t n = strcspn( p, delim );
    
            a[i].pos = p;
            a[i].size =n;
            ++i;
            p += n;
        }
    
        for ( i = 0; i < nmemb; i++ )
        {
            printf( "%.*s ", ( int )a[i].size, a[i].pos );
        } 
    
        putchar( '\n' );   
    }
    

    The program output is

    This a sample string