Given a string, and a set of four alphabets (A, B, C, D) for generating strings of length n. I need a generalized mathematical formula to calculate the number of neighbors for any string of length n with at most d mismatches, and the number of neighbors with exactly d mismatches.
For example: Given a string=”AAA” and d=3
BAA
CAA
DAA
ABA
ACA
ADA
AAB
AAC
AAD
BBA BCA BDA
BAB BAC BAD
CBA CCA CDA
CAB CAC CAD
DBA DCA DDA
DAB DAC DAD
ABB ABC ABD
ACB ACC ACD
ADB ADC ADD
BBB CBB DBB
BCB CCB DCB
BDB CDB DDB
BBC CBC DBC
BCC CCC DCC
BDC CDC DDC
BBD CBD DBD
BCD CCD DCD
BDD CDD DDD
Let's consider a string of size n
.
We want to know how many 'neighbors' this string has, with a distance d
. The first thing we remark, with your definition of 'distance', is that it means that we must choose d
characters among the n
of the string and modify them. So there are n choose d
possible combinations of charactersto modify.
Each of these can be modified in 3
different manners (since the size of the alphabet is 4
.
So ultimately, we have:
n choose d
possible combinations of characters that will be modifiedd
characters will be modified, and each of them can be modified in 3
different manners.So the formula is ultimately (s - 1) ^ d * (n choose d)
, where s
is the size of the alphabet (here 4
). I let you verify that it fits the first examples you provided.