Why is a / (b*c) slower than a / b / c in Python (and in general)?

I stumbled across an interesting optimization question while writing some mathematical derivative functions for a neural network library. Turns out that the expression a / (b*c) takes longer to compute than a / b / c for large values (see timeit below). But since the two expressions are equal:

shouldn't Python be optimized both in the same way on the down-low?
is there a case for a / (b*c), given that it seems to be slower?
or am I missing something, and the two are not always equal?

Thanks in advance :)

In [2]: timeit.timeit('1293579283509136012369019234623462346423623462346342635610 / (52346234623632464236234624362436234612830128521357*32189512234623462637501237)')
Out[2]: 0.2646541080002862

In [3]: timeit.timeit('1293579283509136012369019234623462346423623462346342635610 / 52346234623632464236234624362436234612830128521357 / 32189512234623462637501237')
Out[3]: 0.008390166000026511

Solution

Why is a/(b*c) slower?

(b*c) is multiplying two very big ints with unlimited precision. That is a more expensive operation than performing floating point division (which has limited precision).

Are the two calculations equivalent?

In practice, a/(b*c) and a/b/c can give different results, because floating point calculations have inaccuracies, and doing the operations in a different order can produce a different result.

For example:

a = 10 ** 33
b = 10000000000000002
c = 10 ** 17
print(a / b / c)    # 0.9999999999999999
print(a / (b * c))  # 0.9999999999999998

It boils down to how a computer deals with the numbers it uses.

Why doesn't Python calculate a/(b*c) as a/b/c?

That would give surprising results. The user ought to be able to expect that

d = b*c
a / d

should have the same result as

a / (b*c)

so it would be a source of very mysterious behaviour if a / (b*c) gave a different result because it was magically replaced by a / b / c.