I have a low-level (like really low-level, it's basically all IOCTL calls and several calls to enumeration APIs) that crashes sporadically on Windows Vista/7 on clients' machines. Unfortunately, I have not been able to procure any crash dumps but one helpful user did mention that running the program in XP Compatibility Mode solved the problem.
The application is always launched with full admin rights (it's launched from another program that requires admin authorization) so it's not a UAC issue. I don't use any deprecated APIs and I'm not relying on any registry hacks, etc. I'm just issuing calls to enumerate disks, then using IOCTL commands to get some more low-level info about all attached devices.
What happens in XP Compatibility mode? What does Windows inject into my application or otherwise sandbox it with that prevents it from crashing on Vista/7? I had originally suspected heap corruption (though I've pulled my hair out attempting to replicate or to track down the issue) before being told that it runs fine in XP Compatibility Mode.
Can anyone suggest any possible issues that would be avoided in XP Compat Mode that I should look into to try to address this issue? Thanks!
EDIT:
One more thing that's probably very important to mention: I'm calling DDK/Kernel functions from userspace in order to get at certain features not exposed via the WIN32 API.
I'm using ZwReadFile, ZwCreateFile, ZwWriteFile, RtlInitUnicodeString, ZwQueryVolumeInformationFile, ZwDeviceIoControlFile, ZwSetInformationFile, ZwClose.
The IOCTLs I'm calling include IOCTL_DISK_GET_PARTITION_INFO_EX, IOCTL_STORAGE_GET_DEVICE_NUMBER, IOCTL_DISK_GET_LENGTH_INFO, and IOCTL_DISK_GET_DRIVE_LAYOUT_EX.
This is very odd, but I was calling ZwQueryVolumeInformationFile with FsInformationClass set to FileFsVolumeInformation.
I had passed in a buffer of FILE_FS_VOLUME_INFORMATION first normally allocated, then overallocated to (sizeof(FILE_FS_VOLUME_INFORMATION) + sizeof(TCHAR)*FILE_FS_VOLUME_INFORMATION->VolumeLabelLength)
.
Then I called
FILE_FS_VOLUME_INFORMATION->VolumeLabel[FILE_FS_VOLUME_INFORMATION->VolumeLabelLength/2] = _T('\0');
and only on some machines this would result in memory corruption.
Regardless of the size of the overallocation (even tried allocating a full 256 chars extra!), this would reliably result in heap corruption even when using a vector<unsigned char>
as the FILE_FS_VOLUME_INFORMATION buffer.
It seems that the kernel places some sort of write protection on the buffer somehow that was resulting in corruption regardless of the size. Copying the first VolumeLableLength bytes to a second buffer, then post-pending _T('\0')
solved the problem. Not sure how/why Windows was making the buffer that I allocated and passed in as a parameter readonly or if it was storing after the FILE_FS_VOLUME_INFORMATION struct (which should end with the character array!), but simply not modifying any data in the buffer that I passed did the trick.... which is crazy because it only happens (consistently and 100% reproducible) on certain machines.
At any rate: problem solved *phew*!