I have a DNA sequence, its length for example is m*4n:
B = 'GATTAACTACACTTGAGGCT...';
I have also a vector of real numbers X = {xi, i = 1..m*4n}, and use mod(X,1)
to keep them in the range [0,1]. For example:
X = [0.223 0.33 0.71 0.44 0.91 0.32 0.11 ....... m*4n];
I then need to transform X
into a binary vector by applying the function:
f(x)={0 ,0 < X(i,j) ≤ 0.5; 1 ,0.5 < X(i,j) ≤ 1;)
The output according the previous values will be like X = [0010100 ....]
. If X(i,j)==1
, then B(i,j)
is complemented, otherwise it is unchanged. In this context, the complement is the matching base-pair (i.e. A->T, C->G, G->C, and T->A).
This is the code I've tried so far, with no success:
%%maping X chaotic sequence from real numbers to binary sequence using threshold function
X = v(:,3);
X(257)=[];
disp (X);
mode (X,1);
for i=1
for j=1:256
if ((X(i,j)> 0) && (X(i,j)<= .5))
X(i,j) = 0;
elseif ((X(i,j)> .5) && (X(i,j)<= 1))
X(i,j) = 1;
end
end
end
disp(X);
How can I properly perform the indexing and complement?
Given a sample base-pair sequence stored as a character array:
B = 'GATTAACT';
And a sample vector of numeric values the same length as B
:
X = [0.223 0.33 0.71 0.44 0.91 0.32 0.11 1.6];
Then there is a fairly straightforward solution...
First, your use of the mod
function implies you want to use only the fractional part of each value in X
. This is how you wold do that:
>> X = mod(X, 1)
X =
0.2230 0.3300 0.7100 0.4400 0.9100 0.3200 0.1100 0.6000
Next, you should give the documentation on vectorization a read. It will teach you that a for loop can be avoided for many operations in MATLAB. In particular, applying a logical test to your vector X
can be done like so:
>> index = (X > 0.5)
index =
0 0 1 0 1 0 0 1
And index
is now a logical index the same length as X
with ones (i.e. true) for each value greater than 0.5. You now want to get the characters corresponding to those indices in B
, change them to their complement, then place them back in B
. You can do this using a little trick in MATLAB whereby a character is converted to its ASCII numeric value when used an as index:
>> compMap = ''; % Initialize to an empty string
>> compMap('ACGT') = 'TGCA'
compMap =
T G C A
Notice the characters 'TGCA'
get placed in indices 65, 67, 71, and 84 of compMap
(i.e. the ASCII values for 'ACGT'
). The rest are blanks. Now, you can replace the indexed base-pairs with their complements by simply doing this:
>> B(index) = compMap(B(index))
B =
GAATTACA
Putting this all together, here's the solution:
B = '...'; % Whatever your sequence is
X = [...]; % Whatever your values are
compMap = '';
compMap('ACGT') = 'TGCA'; % Build a complement map
index = (mod(X, 1) > 0.5); % Get your logical index
B(index) = compMap(B(index)); % Replace with complements