Search code examples
clibcurlbus-error

Bus Error on void function return


I'm learning to use libcurl in C. To start, I'm using a randomized list of accession names to search for protein sequence files that may be found hosted here. These follow a set format where the first line is a variable length (but which contains no information I'm trying to query) then a series of capitalized letters with a new line every sixty (60) characters (what I want to pull down, but reformat to eighty (80) characters per line).

I have the call itself in a single function:

//finds and saves the fastas for each protein (assuming on exists)
void pullFasta (proteinEntry *entry, char matchType, FILE *outFile) {
    //Local variables
    URL_FILE *handle;
    char buffer[2] = "", url[32] = "http://www.uniprot.org/uniprot/", sequence[2] = "";

    //Build full URL
    /*printf ("u:%s\nt:%s\n", url, entry->title); /*This line was used for debugging.*/
    strcat (url, entry->title);
    strcat (url, ".fasta");

    //Open URL
    /*printf ("u:%s\n", url); /*This line was used for debugging.*/
    handle = url_fopen (url, "r");

    //If there is data there
    if (handle != NULL) {
        //Skip the first line as it's got useless info
        do {
            url_fread(buffer, 1, 1, handle);
        } while (buffer[0] != '\n');

        //Grab the fasta data, skipping newline characters
        while (!url_feof (handle)) {
            url_fread(buffer, 1, 1, handle);
            if (buffer[0] != '\n') {
                strcat (sequence, buffer);
            }
        }

        //Print it
        printFastaEntry (entry->title, sequence, matchType, outFile);
    }
    url_fclose (handle);
    return;
}

With proteinEntry being defined as:

//Entry for fasta formatable data
typedef struct proteinEntry {
    char title[7];
    struct proteinEntry *next;
} proteinEntry;

And the url_fopen, url_fclose, url_feof, url_read, and URL_FILE code found here, they mimic the file functions for which they are named.

As you can see I've been doing some debugging with the URL generator (uniprot URLs follow the same format for different proteins), I got it working properly and can pull down the data from the site and save it to file in the proper format that I want. I set the read buffer to 1 because I wanted to get a program that was very simplistic but functional (if inelegant) before I start playing with things, so I would have a base to return to as I learned.

I've tested the url_<function> calls and they are giving no errors. So I added incremental printf calls after each line to identify exactly where the bus error is occurring and it is happening at return;.

My understanding of bus errors is that it's a memory access issue wherein I'm trying to get at memory that my program doesn't have control over. My confusion comes from the fact that this is happening at the return of a void function. There's nothing being read, written, or passed to trigger the memory error (as far as I understand it, at least).

Can anyone point me in the right direction to fix my mistake please?

EDIT: As @BLUEPIXY pointed out I had a potential url_fclose (NULL). As @deltheil pointed out I had sequence as a static array. This also made me notice I'm repeating my bad memory allocation for url, so I updated it and it now works. Thanks for your help!


Solution

  • If we look at e.g http://www.uniprot.org/uniprot/Q6GZX1.fasta and skip the first line (as you do) we have:

    MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
    

    Which is a 60 characters string.

    When you try to read this sequence with:

    //Grab the fasta data, skipping newline characters
    while (!url_feof (handle)) {
        url_fread(buffer, 1, 1, handle);
        if (buffer[0] != '\n') {
            strcat (sequence, buffer);
        }
    }
    

    The problem is sequence is not expandable and not large enough (it is a fixed length array of size 2).

    So make sure to choose a large enough size to hold any sequence, or implement the ability to expand it on-the-fly.