Consider the string aba.
All of its substrings are: [a,b,a,ab,ba,aba].
In the above list, character 'a' appears 6 times and b appears 4 times.
Is there any formula for calculating this? that is the number of occurrences of a character 'c' of s in all substrings of s.
In a string of length n, there are n(n+1)/2 substrings.
The character at index i appears once in every substring, except for the substrings that end at character i-1 or before, and except for the substrings that start at character i+1 or after. Note that here I'm only counting this specific occurrence of the character at index i.
Let me rephrase that: in a string S c T
, where S
and T
are two substrings and c
is a character, this occurrence of character c
appears once in every substring of S c T
, except in substrings of S
and in substrings of T
.
Thus the number of substrings in which appears the character at index i
in a string of length n
is:
k_i = n(n+1)/2 - i(i+1)/2 - (n-i-1)(n-i)/2
If we expand, cancel out terms, and refactor:
k_i = n(n+1)/2 - i(i+1)/2 - (n-i-1)(n-i)/2
k_i = (n(n+1) - i(i+1) - (n-i-1)(n-i)) / 2
k_i = (n²+n -i²-i -n²+ni+in-i²+n-i) / 2
k_i = (2n + 2ni -2i - 2i²) / 2
k_i = n + ni - i - i²
k_i = n(i+1) - i(i+1)
k_i = (i+1) (n-i)
We can sum on all occurrences of this character. In python, this gives the following function:
def count_occurences_in_substrings(s, c):
n = len(s)
return sum(
(i + 1) * (n - i)
for i, a in enumerate(s) if a == c
)
Testing by comparing against a bruteforce function that actually generates the substrings:
from itertools import combinations
def bruteforce(s, c):
return ''.join(s[i:j] for i,j in combinations(range(len(s)+1), 2)).count(c)
for s in ['aba', 'abcdefg', 'bcdaefg']:
k1 = count_occurences_in_substrings(s, 'a')
k2 = bruteforce(s, 'a')
print('{:8s}: {:2d},{:2d}'.format(s, k1, k2))
# aba : 6, 6
# abcdefg : 7, 7
# bcdaefg : 16,16