Search code examples
pythonmachine-learningpybrain

Pattern recognition with Pybrain


Is there a method for training pybrain to recognize multiple patterns within a single neural net? For example, I've added several permutations of two different patterns:

First pattern:

(200[1-9], 200[1-9]),(400[1-9],400[1-9])

Second pattern:

(900[1-9], 900[1-9]),(100[1-9],100[1-9])

Then for my unsupervised data set I added (90002, 90009), for which I was hoping it would return [100[1-9],100[1-9]] (second pattern) however it returns [25084, 25084]. I realize that its trying to find the best value given ALL the inputs, however I'm trying to have it distinquish certain patterns within the set if that makes sense.

This is the example I'm working from :

Request for example: Recurrent neural network for predicting next value in a sequence

from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.datasets import SupervisedDataSet,UnsupervisedDataSet
from pybrain.structure import LinearLayer
from pybrain.datasets import ClassificationDataSet
from pybrain.structure.modules.sigmoidlayer import SigmoidLayer
import random

ds = ClassificationDataSet(2, 1)

tng_dataset_size = 1000
unseen_dataset_size = 100
print 'training dataset size is ', tng_dataset_size
print 'unseen dataset size is ', unseen_dataset_size
print 'adding data..'
for x in range(tng_dataset_size):
   rand1 = random.randint(1,9)
   rand2 = random.randint(1,9)

   pattern_one_0 = int('2000'+str(rand1))
   pattern_one_1 = int('2000'+str(rand2))
   pattern_two_0 = int('9000'+str(rand1))
   pattern_two_1 = int('9000'+str(rand2))
   ds.addSample((pattern_one_0,pattern_one_1),(0))#pattern 1, maps to 0
   ds.addSample((pattern_two_0,pattern_two_1),(1))#pattern 2, maps to 1


unsupervised_results = []

net = buildNetwork(2, 1, 1, outclass=LinearLayer,bias=True, recurrent=True)
print 'training ...'
trainer = BackpropTrainer(net, ds)
trainer.trainEpochs(500)


ts = UnsupervisedDataSet(2,)
print 'adding pattern 2 to unseen data'
for x in xrange(unseen_dataset_size):
   pattern_two_0 = int('9000'+str(rand1))
   pattern_two_1 = int('9000'+str(rand1))

   ts.addSample((pattern_two_0, pattern_two_1))#adding first part of pattern 2 to unseen data
   a = [int(i) for i in net.activateOnDataset(ts)[0]]#should map to 1

   unsupervised_results.append(a[0])

print 'total hits for pattern 1 ', unsupervised_results.count(0)
print 'total hits for pattern 2 ', unsupervised_results.count(1)

[[EDIT]] added categorical variable and ClassificationDataSet.

[[EDIT 1]] added larger training set and unseen set


Solution

  • Yes, there is. The problem here is the representation you are choosing. You are training the network to output real numbers, so your NN is a function that approximates to a certain degree the function you sampled and provided in the dataset. Hence the result of some value between 10000 and 40000.

    It looks more like you are looking for a classifier. Given your description I am assuming you have a clearly defined set of patterns, that you are looking for. Then you must map your patterns to a categorical variable. For instance the pattern 1 you mention (200[1-9], 200[1-9]),(400[1-9],400[1-9]) would be 0, pattern 2 would be 1 and so on.

    Then, you train the network to output the class (0,1,...) to which the input pattern belongs. Arguably, given the structure of your patterns, rule-based classification is probably more adequate than ANNs.

    Concerning the amount of data, you need much more of it. Tipically, the most basic approach is to split the dataset into two groups (70-30, for instance). You use 70% of the samples for training, and the remaining 30% you use as unseen data (test data), to assess the generalization/over-fitting of the model. You might want to read about cross-validation once you get the basics running.