I have a function which takes two strings representing DNA (Of the form "ACGT"
) with each letter representing a different base. I would like to test for the number of mutations that have needed to occur for one to become the other.
The types of mutations that I want to consider are:
I can get each of these three things working individually, but the problem that I'm facing is being able to test for all three at once.
For example, the code I've written to test for substitutions, will count all bases in the event of a insertion/deletion mutation.
What sort of algorithm would I need to be able to find the minimum number of mutations to go between two DNA sequences?
This kind of different is called 'edit-distance'
It can be calculated by dynamic programming:
For more details how to implement it see:
This secific edit-distance is also known as Levenshtein distance (You can find sample code: here)