Search code examples
javaarraysstringdna-sequence

Converting a SequenceCharSequence array into a smaller array of the same type that excludes indices that have - chars


the following method ends up building strings that start with null characters and I dont know why. Any explanation and or better solution would be greatly appreciated. Sincerely, mrBurlCe

/* sample input:
SequenceCharSequence[] seqs = new SequenceCharSequence[3][7]; 

seqs[0] = {'A', 'T', 'A', '-', 'G', 'T', 'C'};

seqs[1] = {'A', 'T', 'A', 'A', '-', 'T', 'G'};

seqs[2] = {'A', 'C', '-', 'A', 'G', 'T', 'A'};

int[] range = {1, 7};

expected output variable equivalence:

ugsS[0] == "TTC";

ugsS[1] == "TTG";

ugsS[3] == "CTA";
*/
/* the method below is supposed to take in a SequenceCharSequence[x][y] array of standard      
IUPAC nucleic acid characters such as A, T, G, C and -.
it looks at all indices of y inside and including the given range for '-' chars across 
outer indices(x).
if all elements of the current index are not '-' chars the are added to their   
corresponding
outer index ,x, in the String[] ugsS[x]
*/ 
static String[] gsStr(SequenceCharSequence[] seqs, int[] range) {
    String[] ugsS = new String[seqs.length];
    for(int i = range[0]; i < range[1]; i++) {
        boolean notGap = true;
        for(int sI = 0; sI < seqs.length; sI++) {
            if(seqs[sI].charAt(i) == '-') {
                notGap = false;
            } 
        }
        if(notGap) {
            for(int sII = 0; sII < seqs.length; sII++) {
                ugsS[sII] += seqs[sII].charAt(i);
            }
        }
    }
    return ugsS;
}

Solution

  • String[] ugsS = new String[seqs.length];
    

    This only allocates the space for your strings; you'll need to initialize your array.

    StringBuilder[] ugsS = new String[seqs.length];
    for (int i = 0; i < seqs.length; ++i)
      ugsS[i] = new StringBuilder(seqs[i].length());
    

    This will initialize an array of StringBuilders, each of which has an initial length of the associated sequence.

    if(notGap) {
      for(int sII = 0; sII < seqs.length; sII++) {
         ugsS[sII].append(seqs[sII].charAt(i));
      }
    }
    

    This replaces your += and avoids placing another String on the heap to be garbage collected later on.

    When you want to display or otherwise access your compacted sequences, use the .toString() method on the string builder instance.