It seems that the max resolution for the built in H264 MFT is 4096 × 2304 pixels according to the documentation. https://msdn.microsoft.com/en-us/library/windows/desktop/dd797815(v=vs.85).aspx
Is it possible to use a different MFT that would allow larger formats or is it just not possible?
The limitations are with respect to the encoded format, and the transforms, and not with Media Foundation itself. It seems H265 (HEVC) can support up to 4320p (7680 x 4320) with improved picture quality over H264. However, as you probably noticed, H265 is supported on Windows 10, but their current implementation of the H265 decoder also has a limitation of 4096 x 2304. It may be possible to purchase H265 SDKs which contain transforms and sinks for other versions of windows, and support higher resolutions.
8K UHD [4320p] info described here.
And info on the names of given resolutions here.
Google's VP9 project only supports up to 4096 x 2304. Some have noticed and wrote about Google engineers working on VP10, and have started pushing code into the public libvpx repository where VP8 and VP9 reside. But that will be a while, and much will change before it is ready.
Of course, if you are feeling a bit code crazy, you could always write custom transform and sink for Media Foundation, and support whatever you like with respect to format and resolution, perhaps QUHD 16K (15360 x 8640).
Sounds like you have an interesting project on your hands, but you may be a bit ahead of your time. Hope this helps.
EDIT:
You make a good point (with your comment). If your color format is standard, there is a chance you can use CLSID_CColorConvertDMO to perform the conversion, which will use SIMD registers/instructions when possible (not sure if there are size constraints). It has a dual interface as a DMO and an MFT. It definitely makes life a bit easier.
After creating the CLSID_CColorConvertDMO instance, and setting the input and output types (format, frame size, etc), create an IMFSample (using MFCreateSample), and an IMFMediaBuffer (using MFCreateMemoryBuffer) to the sample (using IMFSample::AddBuffer), then all that is necessary is to call ProcessInput and ProcessOutput to convert the buffer (create all items upfront).
RomanR also provides some code here.
If your format is not standard, you still have good options. After using C++ AMP for a bit (which is a wrapper for DirectCompute and HLSL), it felt more natural for me to simply use DirectCompute and HLSL directly. In the simplest terms, you create views of the data which map to the GPU from your memory stucts, and HLSL is very C like, and not hard to learn.
After your transforms are taken care of, you only have to decide the storage format (headers, etc), and provide the source and sink (for reading and writing/displaying). I suspect it would be a bit bloated compared to H264/H265 (increased I/O) but there would be little overhead with the conversion.