Search code examples
cntk

Improve error message for ill-formed input format?


I have a map file containing data like this:

|labels 0 0 1 0 0 0 |features 0
|labels 1 0 0 0 0 0 |features 2
|labels 0 0 0 1 0 0 |features 3
|labels 0 0 0 0 0 1 |features 7

Data is read into a minibatch with the following code:

from cntk import Trainer, StreamConfiguration, text_format_minibatch_source, learning_rate_schedule, UnitType

mb_source = text_format_minibatch_source('test_map2.txt', [
    StreamConfiguration('features', 1),
    StreamConfiguration('labels', num_classes)])

test_minibatch = mb_source.next_minibatch(2)

If the input file is ill-formed, you will sometimes get a quite cryptic error message. For example a missing line break at the end of the last row in the input file will result in an error like this:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-35-2f1481ccfced> in <module>()
----> 1 test_minibatch = mb_source.next_minibatch(2)

C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\utils\swig_helper.py in wrapper(*args, **kwds)
     56     @wraps(f)
     57     def wrapper(*args, **kwds):
---> 58         result = f(*args, **kwds)
     59         map_if_possible(result)
     60         return result

C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, minibatch_size_in_samples, input_map, device)
    159 
    160         mb = super(MinibatchSource, self).get_next_minibatch(
--> 161                 minibatch_size_in_samples, device)
    162 
    163         if input_map:

C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\cntk_py.py in get_next_minibatch(self, *args)
   1914 
   1915     def get_next_minibatch(self, *args):
-> 1916         return _cntk_py.MinibatchSource_get_next_minibatch(self, *args)
   1917 MinibatchSource_swigregister = _cntk_py.MinibatchSource_swigregister
   1918 MinibatchSource_swigregister(MinibatchSource)

RuntimeError: Invalid chunk requested.

Sometimes it could be hard to figure out where in the file there would be a problem. Would it be possible to emit a more specific error message. Line number in the input file would be useful.


Solution

  • Thanks for reporting the issue. We have created a bug and will be working on fixing the reader behavior with ill-formed input.