I've seen a number of super-resolution networks that seem to imply that it's fine to train a network on inputs of (x,y,d) but then pass in images of arbitrary sizes into a model for prediction, that in Keras for example is specified with the placeholder values (None,None,3) and will accept any size.
for example https://github.com/krasserm/super-resolution is trained on inputs of 24x24x3 but accepts arbitrary sized images for resize, the demo code using 124x118x3.
Is this a sane practice? Does the network when given a larger input simply slide a window over it applying the same weights as it learnt on the smaller size image?
Your guess is correct. Convolutional layers learn to distinguish features at the scale of their kernel, not at the scale of the image as a whole. A layer with a 3x3 kernel will learn to identify a feature up to 3x3 pixels large and will be able to identify that feature in an image whether the image is itself 3x3, 100x100, or 1080x1920.