Search code examples
image-resizingcntk

How to add an Image Resizing transform to CNTK


I want to add an Image transform ( I call it ResizeTransformer) which

  • Resizes the smaller dimension of an image to a given size, while preserving the original aspect ratio

To achieve this without implementing a separate ResizeTransformer I want to modify the class ScaleTransformer : public ImageTransformerBase class in this file However, this class implements StreamInformation ScaleTransformer::Transform(const StreamInformation& inputStream) with the purpose of transforming the stream so that all samples are of the same size. My queries are :

  1. Why is implementing this function necessary ? Does this add any performance benefit or is this important for more basic purposes ?

  2. Do I have to implement ResizeTransformer() as a separate class ?

  3. In such a case do I have to implement the StreamInformation ResizeTransformer::Transform(const StreamInformation& inputStream ?

Need for this transform This transform is needed because all the images in one's dataset can be of different sizes and someone might want to extract multiple patches from each image. In such a case the best solution is to resize the smaller dimension of an image to a certain size S which is greater than the crop size C and then extract multiple patches of size C from it. This kind of data augmentation is practiced in certain papers which I know about.

PS : I made following additions in an effort to add a ResizeTransformer

I am confused about how to test it out. A compilation was successful in C++ meaning that the c++ code is right. But I would like to use it in python.

Additions to header file in my system : `

class ResizeTransformer : public ImageTransformerBase
 {
 public:
   explicit ResizeTransformer(const Microsoft::MSR::CNTK::ConfigParameters& config);

 private:
   enum class ResizeMode
   {
     ResizeMin = 0,
     ResizeMax = 0
    };

   ResizeMode resize_mode;
   size_t resized_length;
   void Apply(uint8_t copyId, cv::Mat &mat) override;
 };

And to the source file :

ResizeTransformer::ResizeTransformer(const ConfigParameters& config) : ImageTransformerBase(config)
{
  resized_length = config(L"resized_length");
  if (resized_length <= 0)
    RuntimeError("Cannot resize any dimension of an image to zero or negative number.");

  string resize_type = config(L"resize_type", "ResizeMin");
  if (resize_type == "ResizeMin")
    resize_mode = ResizeMode::ResizeMin;
  else if (resize_type == "ResizeMax")
    resize_mode = ResizeMode::ResizeMax;
  else RuntimeError("Invalid resize_type. Must be one of ResizeMin and ResizeMax");
}

void ResizeTransformer::Apply(uint8_t, cv::Mat &mat)
{
  float height = mat.rows;
  float width = mat.cols;
  float aspectratio = height/width;
  float newheight{};
  float newwidth{};
  if (resize_mode == ResizeMode::ResizeMin)
    {
      if(height <=width)
    {
      newheight = resized_length;
      newwidth = newheight/aspectratio;
    }
      else
    {
      newheight = aspectratio * resized_length;
      newwidth = resized_length;
    }
    }
  else
    {
      if(height <=width)
    {
      newheight = aspectratio * resized_length;
      newwidth = resized_length;
    }
      else
    {
      newheight = resized_length;
      newwidth = newheight/aspectratio;
    }
    }
  resize(mat, mat, cv::Size2f(newwidth, newheight));
}

I added the following line to this file

transformations.push_back(Transformation{ std::make_shared<ResizeTransformer>(featureStream), featureName });

Then I added the following to this file

CNTK_API ImageTransform ReaderResize(int resized_length,
                                         const wchar_t* resize_type = L"ResizeMin");

Finally I added the following function to this file

def resize(resized_length, resize_type='ResizeMin'):
    '''
    Resize transform that can be used to pass to `map_features`
    Given an input image, it will resize a given dimension to
    a fixed size (resized_length), while preserving the aspect ratio.


    Args:
        resized_length (int): A positive integer. It is the resized value of the
           dimension which has to be resized. The other dimension is resized while
           maintaining the aspect ratio.
        resize_type (str, default 'ResizeMin'): 'ResizeMin' or 'ResizeMax'.
           When 'ResizeMin', the smaller dimension of the image is resized to a fixed size
           given by resized_length, with the larger dimension resized in a way to preserve
           the priginal aspect ratio. When 'ResizeMax', the same operation is performed
           but now the larger dimension of the image is resized to a fixed size.
   Returns:
       A dictionary like object describing the ResizeTransform.
    '''
    return cntk_py.reader_resize(resized_length, resize_type)

Solution

  • 1) This allows upper layers to define buffers ahead of time if possible. So if you know that you will resize to (x, y) - then you can define the output stream shape in there (similar to ScaleTransform). Otherwise - you can set the image layout in the Transform(SequenceDataPtr)/(Apply if you use the ImageBaseTranform class) method.

    2) You can, or you can change the ScaleTransformer to do what you need (just take another parameter in the configuration).

    3) In case you implement your own ResizeTranformer - you can simply put NDShape::Unknown in the transform, something like:

    StreamInformation ResizeTranformer::Transform(
        const StreamInformation& inputStream) 
    {
         TransformBase::Transform(inputStream);
         m_outputStream.m_sampleLayout = NDShape::Unknown();
         return m_outputStream; 
    }
    

    PS. Code looks ok though you probably still need to add a Transform on inputStream as described above. Also please note that when images reach the core network, all of them should have the same dimension. Deserializers do not support images of different shape.

    If you want to expose the ResizeTransformer you will need to do the following:

    1) Implement ResizerTranformer (as we discussed above, you did)

    2) In ImageReader/Exports.cpp add the resolution by name to the CreateTransformer function, i.e.

    else if (type == L"Resize")
            *transformer = new ResizeTransformer(config);
    

    (this one is missing on your side it seems)

    3) Add factory method to the C++ API in CNTKLibrary.h/MinibatchSource.cpp, as an example see scale transform (ReaderScale): (you did) ImageTransform ReaderResize(...) {...}

    4) Implement a python wrapper with checking of params, etc. in bindings/python/cntk/io/transforms.py (you did) def resize(...):

    Then if you recompile and set PATH to your local build (/x64/Release) of CNTK and PYTHON_PATH to /binding/python you should be able to use your new transform. You can add your tests to io/tests and then go to /binding/python/cntk and simply run "pytest".

    I could have forgotten something, so if you bump into any issues, please ask CNTK team, they should be able to help.

    Thanks!