I've trainined a VAE that in PyTorch that I need to convert to CoreML. From this thread PyTorch VAE fails conversion to onnx I was able to get the ONNX model to export, however, this just pushed the problem one step further to the ONNX-CoreML stage.
The original function that contains the torch.randn()
call is the reparametrize func:
def reparametrize(self, mu, logvar):
std = logvar.mul(0.5).exp_()
if self.have_cuda:
eps = torch.randn(self.bs, self.nz, device='cuda')
else:
eps = torch.randn(self.bs, self.nz)
return eps.mul(std).add_(mu)
The solution is, of course, to create a custom layer, but I'm having problems creating a layer with no inputs (i.e., it's just a randn()
call).
I can get the CoreML conversion to complete with this def:
def convert_randn(node):
params = NeuralNetwork_pb2.CustomLayerParams()
params.className = "RandomNormal"
params.description = "Random normal distribution generator"
params.parameters["dtype"].intValue = node.attrs.get('dtype', 1)
params.parameters["bs"].intValue = node.attrs.get("shape")[0]
params.parameters["nz"].intValue = node.attrs.get("shape")[1]
return params
I do the conversion with:
coreml_model = convert(onnx_model, add_custom_layers=True,
image_input_names = ['input'],
custom_conversion_functions={"RandomNormal": convert_randn})
I should also note that, at the completion of the mlmodel
export, the following is printed:
Custom layers have been added to the CoreML model corresponding to the
following ops in the onnx model:
1/1: op type: RandomNormal, op input names and shapes: [], op output
names and shapes: [('62', 'Shape not available')]
Bringing the .mlmodel
into Xcode complains that Layer '62' of type 500 has 0 inputs but expects at least 1.
So I'm wondering how to specify a kind of "dummy" input to the layer, since it doesn't actually have an input -- it's just a wrapper around torch.randn()
(or, more specifically, the onnx RandonNormal
op). I should clarify that I do need the whole VAE, not just the decoder, as I'm actually using the entire process to "error correct" my inputs (i.e., the encoder estimates my z
vector, based on an input, then the decoder generates the closest generalizable prediction of the input).
Any help greatly appreciated.
UPDATE: Okay, I finally got a version to load in Xcode (thanks to @MattijsHollemans and his book!). The originalConversion.mlmodel
is the initial output of converting my model from ONNX to CoreML. To this, I had to manually insert the input for the RandomNormal
layer. I made it (64, 28, 28) for no great reason — I know my batch size is 64, and my inputs are 28 x 28 (but presumably it could also be (1, 1, 1), since it's a "dummy"):
spec = coremltools.utils.load_spec('originalConversion.mlmodel')
nn = spec.neuralNetwork
layers = {l.name:i for i,l in enumerate(nn.layers)}
layer_idx = layers["62"] # '62' is the name of the layer -- see above
layer = nn.layers[layer_idx]
layer.input.extend(["dummy_input"])
inp = spec.description.input.add()
inp.name = "dummy_input"
inp.type.multiArrayType.SetInParent()
spec.description.input[1].type.multiArrayType.shape.append(64)
spec.description.input[1].type.multiArrayType.shape.append(28)
spec.description.input[1].type.multiArrayType.shape.append(28)
spec.description.input[1].type.multiArrayType.dataType = ft.ArrayFeatureType.DOUBLE
coremltools.utils.save_spec(spec, "modelWithInsertedInput.mlmodel")
This loads in Xcode, but I have yet to test the functioning of the model in my app. Since the additional layer is simple, and the input is literally a bogus, non-functional input (just to keep Xcode happy), I don't imagine it will be a problem, but I'll post again if it doesn't run properly.
UPDATE 2: Unfortunately, the model doesn't load at runtime. It fails with [espresso] [Espresso::handle_ex_plan] exception=Failed in 2nd reshape after missing custom layer info.
What I find very strange and confusing is that, inspecting model.espresso.shape
, I see that almost every node has a shape like:
"62" : {
"k" : 0,
"w" : 0,
"n" : 0,
"seq" : 0,
"h" : 0
}
I have two question/concerns: 1) Most obviously, why are all the values zero (this is the case with all but the input nodes), and 2) Why does it appear to be a sequential model, when it's just a fairly conventional VAE? Opening model.espresso.shape
for a fully-functioning GAN in the same app, I see that the nodes are of the format:
"54" : {
"k" : 256,
"w" : 16,
"n" : 1,
"h" : 16
}
That is, they contain reasonable shape info, and they don't have seq
fields.
Very, very confused...
UPDATE 3: I've also just noticed in the compiler report the error: IMPORTANT: new sequence length computation failed, falling back to old path. Your compilation was sucessful, but please file a radar on Core ML | Neural Networks and attach the model that generated this message.
Here's the original PyTorch model:
class VAE(nn.Module):
def __init__(self, bs, nz):
super(VAE, self).__init__()
self.nz = nz
self.bs = bs
self.encoder = nn.Sequential(
# input is (nc) x 28 x 28
nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# size = (ndf) x 14 x 14
nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
# size = (ndf*2) x 7 x 7
nn.Conv2d(ndf * 2, ndf * 4, 3, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
# size = (ndf*4) x 4 x 4
nn.Conv2d(ndf * 4, 1024, 4, 1, 0, bias=False),
nn.LeakyReLU(0.2, inplace=True),
)
self.decoder = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d( 1024, ngf * 8, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# size = (ngf*8) x 4 x 4
nn.ConvTranspose2d(ngf * 8, ngf * 4, 3, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# size = (ngf*4) x 8 x 8
nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
# size = (ngf*2) x 16 x 16
nn.ConvTranspose2d(ngf * 2, nc, 4, 2, 1, bias=False),
nn.Sigmoid()
)
self.fc1 = nn.Linear(1024, 512)
self.fc21 = nn.Linear(512, nz)
self.fc22 = nn.Linear(512, nz)
self.fc3 = nn.Linear(nz, 512)
self.fc4 = nn.Linear(512, 1024)
self.lrelu = nn.LeakyReLU()
self.relu = nn.ReLU()
def encode(self, x):
conv = self.encoder(x);
h1 = self.fc1(conv.view(-1, 1024))
return self.fc21(h1), self.fc22(h1)
def decode(self, z):
h3 = self.relu(self.fc3(z))
deconv_input = self.fc4(h3)
deconv_input = deconv_input.view(-1,1024,1,1)
return self.decoder(deconv_input)
def reparametrize(self, mu, logvar):
std = logvar.mul(0.5).exp_()
eps = torch.randn(self.bs, self.nz, device='cuda') # needs custom layer!
return eps.mul(std).add_(mu)
def forward(self, x):
# print("x", x.size())
mu, logvar = self.encode(x)
z = self.reparametrize(mu, logvar)
decoded = self.decode(z)
return decoded, mu, logvar
To add an input to your Core ML model, you can do the following from Python:
import coremltools
spec = coremltools.utils.load_spec("YourModel.mlmodel")
nn = spec.neuralNetworkClassifier # or just spec.neuralNetwork
layers = {l.name:i for i,l in enumerate(nn.layers)}
layer_idx = layers["your_custom_layer"]
layer = nn.layers[layer_idx]
layer.input.extend(["dummy_input"])
inp = spec.description.input.add()
inp.name = "dummy_input"
inp.type.doubleType.SetInParent()
coremltools.utils.save_spec(spec, "NewModel.mlmodel")
Here, "your_custom_layer"
is the name of the layer you want to add the dummy input to. In your model it looks like it's called 62
. You can look at the layers
dictionary to see the names of all the layers in the model.
Notes:
nn = spec.neuralNetwork
instead of neuralNetworkClassifier
.