I have an array of DNA sequences, like:
AA
TA
AC
CC
and I search a faster way to calculate the hamming distance between all sequences pairs (maybe by sorting...), then naive approach (O(N^2))
For motif1 in array
For motif2 in array
calculate Hamming_Distance(motif1 , motif2)
end
end
I need the sequence of pairs that have an Hamming distance = 1
If your n >> k, then you can try following
Your original complexity is O(nnk), where k is length of the sequence (as Hamming distance comparison requires k steps). Let's try to improve on that.
With that, you have replaced O(nnk) with O(nkk), which should be beneficial if n >> k.
For k >> n you probably don't really care about n^2 part, so use trivial algorithm.
For k near n, you can try what I have suggested in comment