I have this really simple c++ function I wrote myself.
It should just strip the '-' characters out of my string.
Here's the code
char* FastaManager::stripAlignment(char *seq, int seqLength){
char newSeq[seqLength];
int j=0;
for (int i=0; i<seqLength; i++) {
if (seq[i] != '-') {
newSeq[j++]=seq[i];
}
}
char *retSeq = (char*)malloc((--j)*sizeof(char));
for (int i=0; i<j; i++) {
retSeq[i]=newSeq[i];
}
retSeq[j+1]='\0'; //WTF it keeps reading from memory without this
return retSeq;
}
I think that comment speaks for itself.
I don't know why, but when I launch the program and print out the result, I get something like
'stripped_sequence''original_sequence'
However, if I try to debug the code to see if there's anything wrong, the flows goes just right, and ends up returning the correct stripped sequence.
I tried to print out the memory of the two variables, and here are the memory readings
memory for seq: https://i.sstatic.net/dHI8k.png
memory for *seq: https://i.sstatic.net/UqVkX.png
memory for retSeq: https://i.sstatic.net/o9uvI.png
memory for *retSeq: https://i.sstatic.net/ioFsu.png
(couldn't include links / pics because of spam filter, sorry)
This is the code I'm using to print out the strings
for (int i=0; i<atoi(argv[2]); i++) {
char *seq;
if (usingStructure) {
seq = fm.generateSequenceWithStructure(structure);
}else{
seq = fm.generateSequenceFromProfile();
}
cout<<">Sequence "<<i+1<<": "<<seq<<endl;
}
Now, I have really no idea about what's going on.
This happens because you put the terminating zero of a C string outside the allocated space. You should be allocating one extra character at the end of your string copy, and adding '\0'
there. Or better yet, you should use std::string
.
char *retSeq = (char*)malloc((j+1)*sizeof(char));
for (int i=0; i<j; i++) {
retSeq[i]=newSeq[i];
}
retSeq[j]='\0';
it keeps reading from memory without this
This is by design: C strings are zero-terminated. '\0'
signals to string routines in C that the end of the string has been reached. The same convention holds in C++ when you work with C strings.