Search code examples
pythonpython-3.xtreehierarchical-clustering

Construct a tree from a list of data


Suppose I have a list of numbers like below,

a = [1,23,5,72,3,5,15,7,78,1,5,77,23]

Now I need to build a tree from these data.

  1. Divide the dataset into two parts according to the self-defined function par. Let's call these two parts a0, a1.
  2. Apply the function par on a0, a1 respectively, and get a00, a01, a11, a12.
  3. Repeat the process until there is only one number left at the end node.
  4. For each end node, we have a "binary code" like "0010100", where 0 represents left, and 1 represents right at each specific step.

I was trying to use the tree class like below, but I was stuck at the first place.

class Node(input):
    def __init__(self, data):
        self.data = data
        self.left = '*'
        self.right = '*'

Solution

  • It is not clear which values the internal nodes would have. It is maybe not important, but you could assign always the first value in the data array that belongs to that particular subtree.

    The "binary code" is like a navigation path from the root to the node. So for the data you presented, we would expect something like this tree:

    value:                                   1
    path:                       ____________ "" ________ 
                               /                        \
    value:                    1                         15
    path:                  __ 0 __                 _____ 1 _______
                          /       \               /               \
    value:              1         72             15                1
    path:               00        01             10             __ 11 __
                       /  \      /  \           /  \           /        \
    value:            1   23    72   3         15   7         1          77
    path:            000  001   010  011      100  101       110         111
                         /  \       /  \          /  \       /  \       /   \
    value:              23   5     3    5        7    78    1    5     77   23
    path:              0010 0011  0110 0111    1010  1011  1100 1101  1110 1111
    

    You can use a simple Node class, which can store both a value and a path:

    class Node:
        def __init__(self, value, path, left=None, right=None):
            self.value = value
            self.path = path
            self.left = left
            self.right = right
    

    The par function would do something like this:

    def par(data):
        i = len(data) // 2
        return (data[:i], data[i:]) if i else ()
    

    The if..else operator is used to return an empty list when there is only one data element. This will be useful in the main function:

    def maketree(data, path=""):
        return Node(data[0], path, *(
            maketree(part, path + str(i)) for i, part in enumerate(par(data))
        ))
    

    This enumerates the parts that are returned by par and passes those parts to the recursive call (if any parts are returned). At the same time that recursive call gets a path string that is extended with a 0 or 1 (i.e. the index of the enumeration).

    Example call:

    a = [1,23,5,72,3,5,15,7,78,1,5,77,23]
    root = maketree(a)
    
    # Output the properties of one particular node:
    node = root.left.left.right.left
    print("value: {}, path: {}".format(node.value, node.path))
    # Outputs: "value: 23, path: 0010"