Search code examples
c++stringmemorycharmixing

C++ is mixing my strings?


I have this really simple c++ function I wrote myself.
It should just strip the '-' characters out of my string.
Here's the code

char* FastaManager::stripAlignment(char *seq, int seqLength){
    char newSeq[seqLength];
    int j=0;
    for (int i=0; i<seqLength; i++) {
        if (seq[i] != '-') {
            newSeq[j++]=seq[i];
        }
    }

    char *retSeq = (char*)malloc((--j)*sizeof(char));
    for (int i=0; i<j; i++) {
        retSeq[i]=newSeq[i];
    }
    retSeq[j+1]='\0'; //WTF it keeps reading from memory without this
    return retSeq;
}

I think that comment speaks for itself.
I don't know why, but when I launch the program and print out the result, I get something like

'stripped_sequence''original_sequence'

However, if I try to debug the code to see if there's anything wrong, the flows goes just right, and ends up returning the correct stripped sequence.

I tried to print out the memory of the two variables, and here are the memory readings

memory for seq: https://i.sstatic.net/dHI8k.png

memory for *seq: https://i.sstatic.net/UqVkX.png

memory for retSeq: https://i.sstatic.net/o9uvI.png

memory for *retSeq: https://i.sstatic.net/ioFsu.png

(couldn't include links / pics because of spam filter, sorry)

This is the code I'm using to print out the strings

for (int i=0; i<atoi(argv[2]); i++) {
    char *seq;
    if (usingStructure) {
        seq = fm.generateSequenceWithStructure(structure);            
    }else{
        seq = fm.generateSequenceFromProfile();
    }
    cout<<">Sequence "<<i+1<<": "<<seq<<endl;
}

Now, I have really no idea about what's going on.


Solution

  • This happens because you put the terminating zero of a C string outside the allocated space. You should be allocating one extra character at the end of your string copy, and adding '\0' there. Or better yet, you should use std::string.

    char *retSeq = (char*)malloc((j+1)*sizeof(char));
    for (int i=0; i<j; i++) {
        retSeq[i]=newSeq[i];
    }
    retSeq[j]='\0';
    

    it keeps reading from memory without this

    This is by design: C strings are zero-terminated. '\0' signals to string routines in C that the end of the string has been reached. The same convention holds in C++ when you work with C strings.