Search code examples
cfunctionstandards

Is concatenating arbitrary number of strings with nested function calls in C undefined behavior?


I have an application that builds file path names through a series of string concatenations using pieces of text to create a complete file path name.

The question is whether an approach to handle concatenating a small but arbitrary number of strings of text together depends on Undefined Behavior for success.

Is the order of evaluation of a series of nested functions guaranteed or not?

I found this question Nested function calls order of evaluation however it seems to be more about multiple functions in the argument list rather than a sequence of nesting functions.

Please excuse the names in the following code samples. It is congruent with the rest of the source code and I am testing things out a bit first.

My first cut on the need to concatenate several strings was a function that looked like the following which would concatenate up to three text strings into a single string.

typedef wchar_t TCHAR;

TCHAR *RflCatFilePath(TCHAR *tszDest, int nDestLen, TCHAR *tszPath, TCHAR *tszPath2, TCHAR *tszFileName)
{
    if (tszDest && nDestLen > 0) {
        TCHAR *pDest = tszDest;
        TCHAR *pLast = tszDest;

        *pDest = 0;   // ensure empty string if no path data provided.

        if (tszPath) for (pDest = pLast; nDestLen > 0 && (*pDest++ = *tszPath++); nDestLen--) pLast = pDest;
        if (tszPath2) for (pDest = pLast; nDestLen > 0 && (*pDest++ = *tszPath2++); nDestLen--)  pLast = pDest;
        if (tszFileName) for (pDest = pLast; nDestLen > 0 && (*pDest++ = *tszFileName++); nDestLen--)  pLast = pDest;
    }

    return tszDest;
}

Then I ran into a case where I had four pieces of text to put together.

Thinking through this it seemed that most probably there would also be a case for five that would be uncovered shortly so I wondered if there was a different way for an arbitrary number of strings.

What I came up with is two functions as follows.

typedef wchar_t TCHAR;

typedef struct {
    TCHAR *pDest;
    TCHAR *pLast;
    int    destLen;
} RflCatStruct;

RflCatStruct RflCatFilePathX(const TCHAR *pPath, RflCatStruct x)
{
    TCHAR *pDest = x.pLast;
    if (pDest && pPath) for ( ; x.destLen > 0 && (*pDest++ = *pPath++); x.destLen--)  x.pLast = pDest;
    return x;
}

RflCatStruct RflCatFilePathY(TCHAR *buffDest, int nLen, const TCHAR *pPath)
{
    RflCatStruct  x = { 0 };

    TCHAR *pDest = x.pDest = buffDest;
    x.pLast = buffDest;
    x.destLen = nLen;

    if (buffDest && nLen > 0) {   // ensure there is room for at least one character.
        *pDest = 0;   // ensure empty string if no path data provided.
        if (pPath) for (pDest = x.pLast; x.destLen > 0 && (*pDest++ = *pPath++); x.destLen--)  x.pLast = pDest;
    }
    return x;
}

Examples of using these two functions is as follows. This code with the two functions appears to work fine with Visual Studio 2013.

TCHAR buffDest[512] = { 0 };
TCHAR *pPath = L"C:\\flashdisk\\ncr\\database";
TCHAR *pPath2 = L"\\";
TCHAR *pFilename = L"filename.ext";

RflCatFilePathX(pFilename, RflCatFilePathX(pPath2, RflCatFilePathY(buffDest, 512, pPath)));
printf("dest t = \"%S\"\n", buffDest);


printf("dest t = \"%S\"\n", RflCatFilePathX(pFilename, RflCatFilePathX(pPath2, RflCatFilePathY(buffDest, 512, pFilename))).pDest);


RflCatStruct  dStr = RflCatFilePathX(pPath2, RflCatFilePathY(buffDest, 512, pPath));
//   other stuff then
printf("dest t = \"%S\"\n", RflCatFilePathX(pFilename, dStr).pDest);

Solution

  • Arguments to a function call are completely evaluated before the function is invoked. So the calls to RflCatFilePath* will be evaluated in the expected order. (This is guaranteed by §6.5.2.2/10: "There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call.")

    As indicated in a comment, the snprintf function is likely to be a better choice for this problem. (asprintf would be even better, and there is a freely available shim for it which works on Windows.) The only problem with snprintf is that you may have to call it twice. It always returns the number of bytes which would have been stored in the buffer had there been enough space, so if the return value is not less than the size of the buffer, you will need to allocate a larger buffer (whose size you now know) and call snprintf again.

    asprintf does that for you, but it is a BSD/Gnu extension to the standard library.

    In the case of concatenating filepaths, there is a maximum string length supported by the operating system/file system, and you should be able to find out what it is (although it might require OS-specific calls on non-Posix systems). So it might well be reasonable to simply return an error indication if the concatenation does not fit into a 512-byte buffer.

    Just for fun, I include a recursive varargs concatenator:

    #include <stdarg.h>
    #include <stdlib.h>
    #include <string.h>
    
    static char* concat_helper(size_t accum, char* chunk, va_list ap) {
      if (chunk) {
        size_t chunklen = strlen(chunk);
        char* next_chunk = va_arg(ap, char*);
        char* retval = concat_helper(accum + chunklen, next_chunk, ap);
        memcpy(retval + accum, chunk, chunklen);
        return retval;
      } else {
        char* retval = malloc(accum + 1);
        retval[accum] = 0;
        return retval;
      }
    }
    char* concat_list(char* chunk, ...) {
        va_list ap;
        va_start(ap, chunk);
        char* retval = concat_helper(0, chunk, ap);
        va_end(ap);
        return retval;
    }
    

    Since concat_list is a varargs function, you need to supply (char*)NULL at the end of the arguments. On the other hand, you don't need to repeat the function name for each new argument. So an example call might be:

    concat_list(pPath, pPath2, pFilename, (char*)0);
    

    (I suppose you need a wchar_t* version but the changes should be obvious. Watch out for the malloc.) For production purposes, the recursion should probably be replaced by an iterative version which traverses the argument list twice (see va_copy) but I've always been fond of the "there-and-back" recursion pattern.