sorry for this rather simple question, however there is yet too little documentation about the usage of Microsoft's OpenSource AI library CNTK.
I continue to witness people setting the reader's feature start to 1, while setting the labels start to 0. But should both of them be always 0, as informations does in computer science always start from the zero point? Wouldn't we lose one piece of information this way?
Example of CIFAR10 02_BatchNormConv
features=[
#dimension = 3 (rgb) * 32 (width) * 32(length)
dim=3072
start=1
]
labels=[
dim=1
start=0
labelDim=10
labelMappingFile=$DataDir$/labelsmap.txt
]
Microsoft has recently updated this, in order to get rid of these confusion and make the CNTK Definition Language more readable.
Instead of having to define the start of the values within the line, you can now simply define the type of data in the dataset itself:
|labels <tab seperated values> | features <tab seperated values> [EndOfLine/EOL]
if you want to reverse the order of features and lables you can simply go for:
|features <tab seperated values> | labels <tab seperated values> [EndOfLine/EOL]
You only have still to define the dim value, in order to specify the amount of values you want to input.
Note: There's no | at the end of the row. EOL indicates the end of the row.
You are misunderstanding how the reader works. The UCIFastReader
works on a file which contains tab separated feature vector. Each line in this file corresponds to an entry (an image in this case), as well as its classification.
So, dim
tells the reader how many columns to read, while start
tells the reader from which column to start reading.
So, if you had an image of size 2x2, with a 2 labels for each, your file could be of the form <image_pixel_columns><label_columns>
:
0 0 0 0 0 0
0 0 1 0 1 0
...
So the first 4 entries in the line are your features (image pixel values), and the last two are your labels. Your reader would be of the form:
reader=[
readerType=UCIFastReader
file=$DataDir$/Train.txt
randomize=None
features=[
dim=4
start=0
]
labels=[
dim=2
start=4
labelDim=10
labelMappingFile=$DataDir$/labelsmap.txt
]
]
It's just that all examples given have the label placed in the first column.