Search code examples
c++linuxftplibcurlpowerpc

libCurl upload data inactivity timeout not working


I have a cross platform embedded libCurl client application running on a powerpc that behaves differently to its windows counterpart. The basic problem is that the remote server that my client uploads a file to executes a very long operation prior to returning the 226 response (indicating successful upload). The remote FTP server is actually performing flash reclamation at this time and this operation can take up to 900 seconds. In effect, I am attempting to use a data inactivity timeout while awaiting the remote 226 or error response.

On windows this works fine, however on the PowerPC embedded client (where we link against the latest libCurl-7.39.0 library compiled using Mentor Graphics Code Sourcery toolchain for PowerGNU) the client times out after exactly 60 seconds of FTP inactivity.

The way I set the timers is as shown in the code snipped below (note that I am ensuring that the CURLOPT_FTP_RESPONSE_TIMEOUT has a 1 second lower value than the CURLOPT_TIMEOUT. In addition, it is worth noting that the CURLOPT_CONNECTTIMEOUT is set to 60 seconds (perhaps this is coincidence, but it takes CURLOPT_CONNECTTIMEOUT (ie 60) seconds for the inactivity to timeout on the powerPC linux client). I wonder if there is some bug lurking in the CURLOPT_CONNECTTIMEOUT is overwriting or corrupting the CURLOPT_FTP_RESPONSE_TIMEOUT on the linux client?

Other than that my curl options appear to be working fine. I read an article about the implementation of timers within libCurl where it appears that the timers are organized in some sort of 'first to expire' order, perhaps at the time when I update the default CURLOPT_FTP_RESPONSE_TIMEOUT (which defaults to indefinite), its insertion causes a corruption of the timer queue.

// if updating the module could potentially
// cause flash reclamation, set the command to response FTP
// timer to include both delivery time + the max expected
// time for the file put for the biggest file over BASE2 or BASET
auto flashReclTimeout = rContext.getFlashReclTimeout();
if (flashReclTimeout) {
    auto timeoutSecs = duration_cast<seconds>(flashReclTimeout.get());
    auto res = curl_easy_setopt(rContext.getCurlHandle(),
        CURLOPT_TIMEOUT, timeoutSecs.count()+1);
    res = curl_easy_setopt(rContext.getCurlHandle(),
        CURLOPT_FTP_RESPONSE_TIMEOUT, timeoutSecs.count());
    ss  << ", [flash reclamation timeout "
        << timeoutSecs.count()
        << "(s)]";
}
LOG_EVT_INFO(gEvtLog) << rLogPrefix << ss.str() << std::endl;

My default libCurl options are setup as follows

/**
 * Sets the curl options using the current mContextInfo.
 *
 * This never sets the URI curl field as this must be
 * done outside the context object.
 */
void
SLDBContext::setCurlOptions() {
    CURL* pCurl = mCurlHandle.get();
    // reset all curl context info
    curl_easy_reset(pCurl);
    // DEOS does not support EPSV or EPRT

    auto res = curl_easy_setopt(pCurl, CURLOPT_FTP_USE_EPSV, 0L);
    res = curl_easy_setopt(pCurl, CURLOPT_FTP_USE_EPRT, 0L);
    res = curl_easy_setopt(pCurl, CURLOPT_NOSIGNAL, 1L);
#if 0
    // send out TCP keep-alive probes - not required
    res = curl_easy_setopt(pCurl, CURLOPT_TCP_KEEPALIVE, 1L);
    // check to ensure that this is supported
    if (res == CURLE_OK) {
        // wait for at least 30 seconds before sending keep-alive probes
        // every 2 seconds
        res = curl_easy_setopt(pCurl, CURLOPT_TCP_KEEPIDLE, 30L);
        res = curl_easy_setopt(pCurl, CURLOPT_TCP_KEEPINTVL, 30L);
    }
#endif
    // do not perform CWD when traversing the pseudo directories
    res = curl_easy_setopt(pCurl, CURLOPT_FTP_FILEMETHOD, CURLFTPMETHOD_NOCWD);
    res = curl_easy_setopt(pCurl, CURLOPT_CONNECTTIMEOUT, getConnectTimeoutSecs());

    if (!isPASVMode()) {
        res = curl_easy_setopt(pCurl, CURLOPT_FTPPORT, "-");
    }
    // used to capture header traffic
    if (mHeaderCallback) {
        res = curl_easy_setopt(pCurl, CURLOPT_WRITEHEADER, mpHeaderStream);
        res = curl_easy_setopt(pCurl, CURLOPT_HEADERFUNCTION, mHeaderCallback);
    }
    // for FTP GET operations
    if (mWriteDataCallback) {
        res = curl_easy_setopt(pCurl, CURLOPT_WRITEDATA, &mScratchBuffer);
        res = curl_easy_setopt(pCurl, CURLOPT_WRITEFUNCTION, mWriteDataCallback);
    }
    // for FTP PUT operations
    if (mReadFileCallback) {
        res = curl_easy_setopt(pCurl, CURLOPT_READFUNCTION, mReadFileCallback);
    }

    // @JC this feature may be causing slowdowns on the target platform
#if 0
    // capture error details to this buffer
    res = curl_easy_setopt(pCurl, CURLOPT_ERRORBUFFER, mErrorBuffer.get());
#endif

    // progress callback used to track upload progress only
    if (mProgressCallback) {
        res = curl_easy_setopt(pCurl, CURLOPT_XFERINFOFUNCTION, mProgressCallback);
        res = curl_easy_setopt(pCurl, CURLOPT_NOPROGRESS, 0L);
        res = curl_easy_setopt(pCurl, CURLOPT_XFERINFODATA, nullptr);
    }

    // verbose logging
    if (mDebuggingCallback) {
        res = curl_easy_setopt(pCurl, CURLOPT_VERBOSE, 1L);
        res = curl_easy_setopt(pCurl, CURLOPT_DEBUGFUNCTION, mDebuggingCallback);
        res = curl_easy_setopt(pCurl, CURLOPT_DEBUGDATA, nullptr);
    } else {
        res = curl_easy_setopt(pCurl, CURLOPT_VERBOSE, 0L);
        res = curl_easy_setopt(pCurl, CURLOPT_DEBUGDATA, nullptr);
    }

    // disable Nagle algorithm - to fix slowdown in bulk transfers
    // with large data files @JC not necessary
    // res = curl_easy_setopt(pCurl, CURLOPT_TCP_NODELAY, 1L);
    if (mSocketOptionCallback) {
        res = curl_easy_setopt(pCurl, CURLOPT_SOCKOPTDATA, nullptr);
        res = curl_easy_setopt(pCurl, CURLOPT_SOCKOPTFUNCTION, mSocketOptionCallback);
    }
}

Solution

  • Actually I found the problem - turned out to be mostly my issue:

    after much debugging and sprinkling of printouts on our target platform, it turns out that the source of the bug was 75% an application problem (mine) and 25% (in my opinion) an libCurl issue due to the weakness of using loosely coupled va_args to extract arguments from a variable length argument list while setting libCurl options. The problem had to do with an implicit 'long long' to 'long' conversion and also an Endian related issue on the PowerPC platform which was not an issue on the windows platform.

    I use libCurl for our FTP client needs within a C++ application - linked against the standard template C++ library. I use std::chrono::seconds objects to set time and duration libCurl options. Under the covers however, the std::chrono::seconds is a rather complex template type with an internal representation of n 8 byte PPC 'long long' which differs from the 4 byte PPC 'long' that is hard coded in the options below. Due to the loose coupling between passed in 'long long' argument and the actual 'long', the value set in CURLOPT_SERVER_RESPONSE_TIMEOUT was actually the incorrect 4 bytes from the 8 byte 'long long' on the power PC platform. I confirmed this by writing a snippet of code to verify how it worked on windows and not on our 32bit PPC embedded target.

    The way I set fixed the code at the application level was by ensuring that there was an explicit cast to the same type as the va_arg 2nd parameter - this was required as the seconds::count() method returns a long long, without this the CURLOPT_SERVER_RESPONSE_TIMEOUT option was surprisingly set to 0. Hope this is Helpful

    if (flashReclTimeout) {
        // fix for broken flash reclamation timer on target platform
        // caused by 'long long' to 'long' conversion always
        // setting a 0 in the associated timers.
        auto timeoutSecs = duration_cast<seconds>(flashReclTimeout.get());
        /*auto res = */curl_easy_setopt(rContext.getCurlHandle(),
            CURLOPT_TIMEOUT, static_cast<long>(timeoutSecs.count() + 1));
        /*auto res = */curl_easy_setopt(rContext.getCurlHandle(),
            CURLOPT_FTP_RESPONSE_TIMEOUT, static_cast<long>(timeoutSecs.count()));
        ss  << ", [flash reclamation timeout "
            << timeoutSecs.count()
            << "(s)]";
    }
    

    Here is the implementation within libCurl where the CURLOPT_SERVER_RESPONSE_TIMEOUT is set (is a synonym for the CURLOPT_FTP_RESPONSE_TIMEOUT option which I used in my application.

    CURLcode Curl_setopt(struct SessionHandle *data, CURLoption option,
                         va_list param)
    {
      char *argptr;
      CURLcode result = CURLE_OK;
      long arg;
    #ifndef CURL_DISABLE_HTTP
      curl_off_t bigsize;
    #endif
    
      switch(option) {
      case CURLOPT_DNS_CACHE_TIMEOUT:
     .  . .
    
      case CURLOPT_SERVER_RESPONSE_TIMEOUT:
        /*
         * Option that specifies how quickly an server response must be obtained
         * before it is considered failure. For pingpong protocols.
         */
        data->set.server_response_timeout = va_arg( param , long ) * 1000;
        break;
    

    Dan Fandrich on the libCurl user forum correctly pointed out:

    CURLOPT_FTP_RESPONSE_TIMEOUT (formerly CURLOPT_SERVER_RESPONSE_TIMEOUT) is documented to take a long. There's no ambiguity about that. Since curl_easy_setopt uses varargs, there's not much choice other than casting in this situation, or with any other argument to curl_easy_setopt that doesn't match the requested type. I'm glad you found the source of problems in your program, but as the man page for curl_easy_setopt says:

    Read this manual carefully as bad input values may cause libcurl to behave badly!

    And Dan Steinberg, the Maintainer/Author of most of LibCurl responded to my assertion that varargs was a weak api that is prone to user errors:

    Yeah, using varargs for this was probably not the wisest design choice when we created the API some 14 years ago but it also why we continously stress the exact variable type to pass in for each option.

    The typecheck-gcc.h macromania is another way we try to help users to discover these mistakes.

    So to summarize, the actual problem was mine - not reading the documentation correctly, however the underlying weakness of the varargs api contributed to an inherent weakness in the API - the lesson learned was read the manual and be very very careful of any automatic type conversions of underlying types from std::chrono::duration types in my particular case.