the following method ends up building strings that start with null characters and I dont know why. Any explanation and or better solution would be greatly appreciated. Sincerely, mrBurlCe
/* sample input:
SequenceCharSequence[] seqs = new SequenceCharSequence[3][7];
seqs[0] = {'A', 'T', 'A', '-', 'G', 'T', 'C'};
seqs[1] = {'A', 'T', 'A', 'A', '-', 'T', 'G'};
seqs[2] = {'A', 'C', '-', 'A', 'G', 'T', 'A'};
int[] range = {1, 7};
expected output variable equivalence:
ugsS[0] == "TTC";
ugsS[1] == "TTG";
ugsS[3] == "CTA";
*/
/* the method below is supposed to take in a SequenceCharSequence[x][y] array of standard
IUPAC nucleic acid characters such as A, T, G, C and -.
it looks at all indices of y inside and including the given range for '-' chars across
outer indices(x).
if all elements of the current index are not '-' chars the are added to their
corresponding
outer index ,x, in the String[] ugsS[x]
*/
static String[] gsStr(SequenceCharSequence[] seqs, int[] range) {
String[] ugsS = new String[seqs.length];
for(int i = range[0]; i < range[1]; i++) {
boolean notGap = true;
for(int sI = 0; sI < seqs.length; sI++) {
if(seqs[sI].charAt(i) == '-') {
notGap = false;
}
}
if(notGap) {
for(int sII = 0; sII < seqs.length; sII++) {
ugsS[sII] += seqs[sII].charAt(i);
}
}
}
return ugsS;
}
String[] ugsS = new String[seqs.length];
This only allocates the space for your strings; you'll need to initialize your array.
StringBuilder[] ugsS = new String[seqs.length];
for (int i = 0; i < seqs.length; ++i)
ugsS[i] = new StringBuilder(seqs[i].length());
This will initialize an array of StringBuilders, each of which has an initial length of the associated sequence.
if(notGap) {
for(int sII = 0; sII < seqs.length; sII++) {
ugsS[sII].append(seqs[sII].charAt(i));
}
}
This replaces your += and avoids placing another String on the heap to be garbage collected later on.
When you want to display or otherwise access your compacted sequences, use the .toString()
method on the string builder instance.