Search code examples
matlabshufflehamming-distance

shuffling the string code in any language


I am trying to code a program to shuffle the dna sequence as much as possible to destroy the order in the sequence. i have written matlab code but its too slow. Also i was looking into hamming distance measure or levenstein measure, also how can i incorporate those measure to ensure proper shuffling. the rules I followed in shuffling

  • rule 1: the ith residue should not be near i-1,i-2,i-3,i+1,i+2,i+3
  • rule 2: in next arrangement i's new position and old position must be at 20 place difference. i.e. if A had 1st position in the string in shuffled string it must be more than equal to 21st position.
function seq=shuffling(str)

len=length(str);
t1=0.4;
seqlen=1:len;
if(len>150)
t1=0.90;
elseif(len>=100)
t1=0.7;
end
while 1
shufseq=randperm(len);
temp1=diff([seqlen;shufseq]);%differences between order indices of original and shuffled arrangement
if(isempty(find(temp1==0)) && isempty(find(diff(shufseq)==1|diff(shufseq)==2 |diff(shufseq)==3 |diff(shufseq)==4 |diff(shufseq)==-1|diff(shufseq)==-2 |diff(shufseq)==-3 |diff(shufseq)==-4)))% rule 1
        if((length(find(temp1>20|temp1<-20))/len)>t1)%rule 2 if ratio of (counts of arrangements/length of the string) should be more than one after certain length threshhold(=t1)
    break
        else
        continue
        end
else
continue
end
end
seq=str(shufseq);

Solution

  • i came up with one alternative. i.e. knowing the composition or counts of unique alphabets in the string. then choosing randomly among these alphabets and reducing their count by 1 in each iteration. this iteration is over the length of the sequence.

    function seq=newshuffle(str)
    %#codegen
    len=length(str);
    seq=[];
    ndict= ['A';'C';'G';'T'];
    ncomp=struct2array(count(str))';
    for l=1:len
        while 1
            x=randi(4,1,1);
            if ncomp(x)~=0
                break;
            end
        end
        seq=[seq,ndict(x)];
        ncomp(x)=ncomp(x)-1;
    end
    end