So I'm trying to code something to tell me the number of elements in any given compound. I'm not even sure where to start: I tried coding something but then realized that it only worked for simple compounds (or did not work at all). Here's an example of what I want:
>>> function_input : 'NaMg3Al6(BO3)3Si6O18(OH)4', 'O'
>>> function_return : 31
I've come so far in my mess of a code (IT DOESN'T WORK, it just illustrates my rough thought process):
def get_pos_from_count(string: str, letter: str):
count = string.count(letter)
lens = [-1]
for i in range(count):
lens += [string[lens[i] + 1:].index(letter) + lens[i] + 1]
return lens[1:]
def find_number(string, letter):
if string.count(letter) == 0: return 0
numbers = '1234567890'
try:
mul1 = int(string[0])
except ValueError:
mul1 = 0
mul2 = []
sub_ = 1
list_of_positions = get_pos_from_count(string, letter)
for i in list_of_positions:
try:
sub_ += int(string[i + 1]) if string[i + 1] in numbers else 0
except IndexError: pass
if string[i + 1:].count(')') > string[i + 1].count('('):
try:
mul2 += int(string[string[i + 1:].count(')') + 1])
except (IndexError, ValueError): pass
return mul1 * sub_ * mul2
The approach I was trying to implement was:
But then I realized my code would either be extremely long or require recursion, which I don't know how to apply here.
If possible, I'd like a semi-working function, but a quick tip on how to approach this is also helpful!
And I don't want to use external libraries if possible.
tl;dr: This question for atomicity of elements, without external libraries (if possible).
EDIT: Yes, the question I linked does have hints on how to do this, but when I tried to make any code work for only one element, and set it's weight to 1, I ran into a host of issues I don't know to solve.
Let us divide the task into three parts:
Introducing my tools:
more_itertools.split_when
;collections.counter
.from more_itertools import split_when, pairwise
from itertools import chain
from collections import Counter
def nest_brackets(tokens, i = 0):
l = []
while i < len(tokens):
if tokens[i] == ')':
return i,l
elif tokens[i] == '(':
i,subl = nest_brackets(tokens, i+1)
l.append(subl)
else:
l.append(tokens[i])
i += 1
return i,l
def parse_compound(s):
tokens = [''.join(t) for t in split_when(s, lambda a,b: b.isupper() or b in '()' or (b.isdigit() and not a.isdigit()))]
tokens = [(int(t) if t.isdigit() else t) for t in tokens]
i, l = nest_brackets(tokens)
assert(i == len(tokens)) # crash if unmatched ')'
return l
def count_elems(parsed_compound):
c = Counter()
for a,b in pairwise(chain(parsed_compound, (1,))):
if not isinstance(a, int):
subcounter = count_elems(a) if isinstance(a, list) else {a: 1}
n = b if isinstance(b, int) else 1
for elem,k in subcounter.items():
c[elem] += k * n
return c
s = 'NaMg3Al6(B(CO2)3)3Si6O18(OH)4'
l = parse_compound(s)
print(l)
# ['Na', 'Mg', 3, 'Al', 6, ['B', ['C', 'O', 2], 3], 3, 'Si', 6, 'O', 18, ['O', 'H'], 4]
c = count_elems(l)
print(c)
# Counter({'O': 40, 'C': 9, 'Al': 6, 'Si': 6, 'Mg': 3, 'B': 3, 'Na': 1})
print(c['O'])
# 40