I have written a program that generates random expressions and then uses genetic techniques to select for fitness.
The following part of the program generates the random expression and stores it in a tree structure.
As this can get called billions of times during a run, I thought it should be optimized for time.
I'm new to programming and I work (play) by myself so, as much as I search on the inernet for
ideas, I'd like some input as I feel like I'm doing this in isolation.
The bottlenecks seem to be Node.init (), (22% of total time) and random.choice(), (14% of total time)
import random
def printTreeIndented(data, level=0):
'''utility to view the tree
'''
if data == None:
return
printTreeIndented(data.right, level+1)
print ' '*level + ' '+ str(data.cargo)#+ ' '+ str(data.seq)+ ' '+ str(data.branch)
printTreeIndented(data.left, level+1)
#These are the global constants used in the Tree.build_nodes() method.
Depth = 5
Ratio = .6 #probability of terminating the current branch.
Atoms = ['1.0','2.0','3.0','4.0','5.0','6.0','7.0','8.0','9.0','x','x','x','x']
#dict of operators. the structure is: operator: number of arguements
Operators = {'+': 2, '-': 2, '*': 2, '/': 2, '**': 2}
class KeySeq:
'''Iterator to produce sequential
integers for keys in Tree.thedict
'''
def __init__(self, data = 0):
self.data = data
def __iter__(self):
return self
def next(self):
self.data = self.data + 1
return self.data
KS = KeySeq()
class Node(object):
'''
'''
def __init__(self, cargo, left=None, right=None):
object.__init__(self)
self.isRoot = False
self.cargo = cargo
self.left = left
self.right = right
self.parent = None
self.branch = None
self.seq = 0
class Tree(object):
def __init__(self):
self.thedict = {} #provides access to the nodes for further mutation and
# crossbreeding.
#When the Tree is instantiated, it comes filled with data.
self.data = self.build_nodes()
# Uncomment the following lines to see the data and a crude graphic of the tree.
# print 'data: '
# for v in self.thedict.itervalues():
# print v.cargo,
# print
# print
# printTreeIndented(self.data)
def build_nodes (self, depth = Depth, entry = 1, pparent = None,
bbranch = None):
'''
'''
r = float()
r = random.random()
#If r > Ratio, it forces a terminal node regardless of
#the value of depth.
#If entry = 1, then it's the root node and we don't want
# a tree with just a value in the root node.
if (depth <= 0) or ((r > Ratio) and (not (entry))):
'''
Add a terminal node.
'''
this_atom = (random.choice(Atoms))
this_atom = str(this_atom)
this_node = Node(this_atom)
this_node.parent = pparent
this_node.branch = bbranch
this_node.seq = KS.next()
self.thedict[this_node.seq] = this_node
return this_node
else:
'''
Add a node that has branches.
'''
this_operator = (random.choice(Operators.keys()))
this_node = Node(this_operator)
if entry:
this_node.isRoot = True
this_node.parent = pparent
this_node.branch = bbranch
this_node.seq = KS.next()
self.thedict[this_node.seq] = this_node
#branch as many times as 'number of arguements'
# it's only set up for 2 arguements now.
for i in range(Operators[this_operator]):
depth =(depth - 1)
if i == 0:
this_node.left = (self.build_nodes(entry = 0, depth =(depth),
pparent = this_node, bbranch = 'left'))
else:
this_node.right = (self.build_nodes(entry = 0, depth =(depth),
pparent = this_node, bbranch = 'right'))
return this_node
def Main():
for i in range(100000):
t = Tree()
return t
if __name__ == '__main__':
rresult = Main()
Below, I've summarized some of the more obvious optimization efforts, without really touching the algorithm much. All timings are done with Python 2.6.4 on a Linux x86-64 system.
Initial time: 8.3s
jellybean already pointed some out. Just fixing those already improves the runtime a little bit. Replacing the repeated calls to Operators.keys()
by using the same list again and again also saves some time.
Time: 6.6s
Pointed out by Dave Kirby, simply using itertools.count
also saves you some time:
from itertools import count
KS = count()
Time: 6.2s
Since you're not setting all attributes of Node
in the ctor, you can just move the attribute declarations into the class body:
class Node(object):
isRoot = False
left = None
right = None
parent = None
branch = None
seq = 0
def __init__(self, cargo):
self.cargo = cargo
This does not change the semantics of the class as far as you're concerned, since all values used in the class body are immutable (False
, None
, 0
), if you need other values, read this answer on class attributes first.
Time: 5.2s
In your code, you're not changing the expression tree any more, so you might as well use an object that is immutable. Node
also does not have any behavior, so using a namedtuple
is a good option. This does have an implication though, since the parent
member had to be dropped for now. Judging from the fact that you might introduce operators with more than two arguments, you would have to replace left/right with a list of children anyway, which is mutable again and would allow creating the parent node before all the children.
from collections import namedtuple
Node = namedtuple("Node", ["cargo", "left", "right", "branch", "seq", "isRoot"])
# ...
def build_nodes (self, depth = Depth, entry = 1, pparent = None,
bbranch = None):
r = random.random()
if (depth <= 0) or ((r > Ratio) and (not (entry))):
this_node = Node(
random.choice(Atoms), None, None, bbranch, KS.next(), False)
self.thedict[this_node.seq] = this_node
return this_node
else:
this_operator = random.choice(OpKeys)
this_node = Node(
this_operator,
self.build_nodes(entry = 0, depth = depth - 1,
pparent = None, bbranch = 'left'),
self.build_nodes(entry = 0, depth = depth - 2,
pparent = None, bbranch = 'right'),
bbranch,
KS.next(),
bool(entry))
self.thedict[this_node.seq] = this_node
return this_node
I've kept the original behavior of the operand loop, that decrements the depth at each iteration. I'm not sure this is wanted behavior, but changing it increases runtime and therefore makes comparison impossible.
Final time: 4.1s
If you want to have support for more than two operators and/or support for the parent attribute, use something along the lines of the following code:
from collections import namedtuple
Node = namedtuple("Node", ["cargo", "args", "parent", "branch", "seq", "isRoot"])
def build_nodes (self, depth = Depth, entry = 1, pparent = None,
bbranch = None):
r = random.random()
if (depth <= 0) or ((r > Ratio) and (not (entry))):
this_node = Node(
random.choice(Atoms), None, pparent, bbranch, KS.next(), False)
self.thedict[this_node.seq] = this_node
return this_node
else:
this_operator = random.choice(OpKeys)
this_node = Node(
this_operator, [], pparent, bbranch,
KS.next(), bool(entry))
this_node.args.extend(
self.build_nodes(entry = 0, depth = depth - (i + 1),
pparent = this_node, bbranch = i)
for i in range(Operators[this_operator]))
self.thedict[this_node.seq] = this_node
return this_node
This code also decreases the depth with the operator position.