Search code examples
cc-stringsfunction-definitionends-with

Conversion of string.endswith method into C


I am beginning a personal project of converting an interpreter written in python into C. It is purely for learning purposes.

The first thing I have come across is trying to convert the following:

if __name__ == "__main__":
    if not argv[-1].endswith('.py'):
        ...

And I have done the following conversion thus far for the endswith method

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

bool endswith(char* str, char* substr)
{
    // case1: one of the strings is empty
    if (!str || !substr) return false;

    char* start_of_substring = strstr(str, substr);

    // case2: not in substring
    if (!start_of_substring) return false;

    size_t length_of_string    = strlen(str);
    size_t length_of_substring = strlen(substr);
    size_t index_of_match      = start_of_substring - str;

    // case2: check if at end
    return (length_of_string == length_of_substring + index_of_match);

}

int main(int argc, char* argv[])
{
    char *last_arg = argv[argc-1];
    if (endswith(last_arg, ".py")) {
        // ...
    } 

}

Does this look like it's covering all the cases in an endswith, or am I missing some edge cases? If so, how can this be improved and such? Finally, this isn't a criticism but more a genuine question in writing a C application: is it common that writing C will require 5-10x more code than doing the same thing in python (or is that more because I'm a beginner and don't know how to do things properly?)

And related: https://codereview.stackexchange.com/questions/54722/determine-if-one-string-occurs-at-the-end-of-another/54724


Solution

  • Does this look like it's covering all the cases in an endswith, or am I missing some edge cases?

    You are missing at least the case where the substring appears twice or more, one of the appearances at the end.

    I wouldn't use strstr() for this. Instead, I would determine from the relative lengths of the two strings where in the main string to look, and then use strcmp(). Example:

    bool endswith(char* str, char* substr) {
        if (!str || !substr) return false;
    
        size_t length_of_string    = strlen(str);
        size_t length_of_substring = strlen(substr);
    
        if (length_of_substring > length_of_string) return false;
    
        return (strcmp(str + length_of_string - length_of_substring, substr) == 0);
    }
    

    With regard to that return statement: str + length_of_string - length_of_substring is equivalent to &str[length_of_string - length_of_substring] -- that is, a pointer to the first character of the trailing substring the same length the same length as substr. The strcmp function compares two C strings, returning an integer less than, equal to, or greater than zero depending on whether the first argument is lexicographically less than, equal to, or greater than the second. In particular, strcmp() returns 0 when its argument are equal, and this function returns the result of exactly such a test.

    is it common that writing C will require 5-10x more code than doing the same thing in python

    Python is a higher-level language than C, so it is common for C code for a task to be lengthier than Python code for the same task. Also, that C blocks are explcitly delimited makes C code a little longer than Python code. I'm not sure that 5-10x is a good estimate, though, and I think that in this case you're comparing apples to oranges. The code analogous to your Python code is simply

    int main(int argc, char* argv[]) {
        if (endswith(argv[argc-1], ".py")) {
            // ...
        } 
    }
    

    That C has no built-in endswith() function is a separate matter.