Huffman coding rules and optimization

All sources i've read quote the below process to get Huffman codes:

Arrange the elements in ascending order of their frequencies.
Create a binary tree by repeatedly combining the two least frequent elements into a single node, with the sum of their frequencies as the frequency of the new node.
Assign '0' to the left branch and '1' to the right branch at each step.
The Huffman code for each element is obtained by traversing from the root of the tree to the respective leaf node, recording the path taken (0 for left, 1 for right).

Let's try this for the following Frequency Map:

A -> 1 B -> 2 C -> 3 D -> 4

Making the Tree

        (ABCD, 10)
       /          \
   (ABC, 6)         D:4
   /     \
 (AB, 3)  C:3
 /    \
A:1   B:2

Following this tree to get the codes:

A -> 000 B -> 001 C -> 01 D -> 1

Obviously, the Max depth and the max code length does not follow the logical rule of being log2(nElement), which in this case should be two.

Optimally the codes would be:

A -> 00 B -> 01 C -> 10 D -> 11

So what's goin on here ? what am i doing wrong ?

Solution

Imagine encoding String "ABBCCCDDDD" - sample string where each symbol occurs with the given frequency.

Huffman encoding you came up with encodes it as 0000010010101011111 - 19 characters.

Your encoding where you optimized for max code length gives 00010110101011111111 - 20 characters.

Your mistake is thinking that max depth is what's being optimized (or what needs to be optimized) - but Huffman encoding optimizes for encoded text length.

And as a side note, as I mentioned in a comment, Huffman produces "an" optimal solution, not "the" optimal solution, it's possible I think for a lower-depth solution to be just as good, but it won't be better.