I need to checksum every single file on a given USB disk in a C# application. I suspect the bottleneck here is the actual read off the disk so I'm looking to make this as fast as possible.
I suspect this would be much quicker if I could read the files on the disk sequentially, in the actual order they appear on the disk (assuming the drive is not fragmented).
How can I find this information for each file from it's standard path? i.e. given a file at "F:\MyFile.txt", how can I find the start location of this file on the disk?
I'm running a C# application in Windows.
Now... I don't really know if it will be useful for you:
[StructLayout(LayoutKind.Sequential)]
public struct StartingVcnInputBuffer
{
public long StartingVcn;
}
public static readonly int StartingVcnInputBufferSizeOf = Marshal.SizeOf(typeof(StartingVcnInputBuffer));
[StructLayout(LayoutKind.Sequential)]
public struct RetrievalPointersBuffer
{
public uint ExtentCount;
public long StartingVcn;
public long NextVcn;
public long Lcn;
}
public static readonly int RetrievalPointersBufferSizeOf = Marshal.SizeOf(typeof(RetrievalPointersBuffer));
[DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
public static extern SafeFileHandle CreateFileW(
[MarshalAs(UnmanagedType.LPWStr)] string filename,
[MarshalAs(UnmanagedType.U4)] FileAccess access,
[MarshalAs(UnmanagedType.U4)] FileShare share,
IntPtr securityAttributes,
[MarshalAs(UnmanagedType.U4)] FileMode creationDisposition,
[MarshalAs(UnmanagedType.U4)] FileAttributes flagsAndAttributes,
IntPtr templateFile);
[DllImport("kernel32.dll", ExactSpelling = true, SetLastError = true, CharSet = CharSet.Auto)]
static extern bool DeviceIoControl(IntPtr hDevice, uint dwIoControlCode,
ref StartingVcnInputBuffer lpInBuffer, int nInBufferSize,
out RetrievalPointersBuffer lpOutBuffer, int nOutBufferSize,
out int lpBytesReturned, IntPtr lpOverlapped);
// Returns a FileStream that can only Read
public static void GetStartLogicalClusterNumber(string fileName, out FileStream file, out long startLogicalClusterNumber)
{
SafeFileHandle handle = CreateFileW(fileName, FileAccess.Read | (FileAccess)0x80 /* FILE_READ_ATTRIBUTES */, FileShare.Read, IntPtr.Zero, FileMode.Open, 0, IntPtr.Zero);
if (handle.IsInvalid)
{
throw new Win32Exception();
}
file = new FileStream(handle, FileAccess.Read);
var svib = new StartingVcnInputBuffer();
int error;
RetrievalPointersBuffer rpb;
int bytesReturned;
DeviceIoControl(handle.DangerousGetHandle(), (uint)589939 /* FSCTL_GET_RETRIEVAL_POINTERS */, ref svib, StartingVcnInputBufferSizeOf, out rpb, RetrievalPointersBufferSizeOf, out bytesReturned, IntPtr.Zero);
error = Marshal.GetLastWin32Error();
switch (error)
{
case 38: /* ERROR_HANDLE_EOF */
startLogicalClusterNumber = -1; // empty file. Choose how to handle
break;
case 0: /* NO:ERROR */
case 234: /* ERROR_MORE_DATA */
startLogicalClusterNumber = rpb.Lcn;
break;
default:
throw new Win32Exception();
}
}
Note that the method will return a FileStream
that you can keep open and use to read the file, or you can easily modify it to not return it (and not create it) and then reopen the file when you want to hash it.
To use:
string[] fileNames = Directory.GetFiles(@"D:\");
foreach (string fileName in fileNames)
{
try
{
long startLogicalClusterNumber;
FileStream file;
GetStartLogicalClusterNumber(fileName, out file, out startLogicalClusterNumber);
}
catch (Exception e)
{
Console.WriteLine("Skipping: {0} for {1}", fileName, e.Message);
}
}
I'm using the API described here: https://web.archive.org/web/20160130161216/http://www.wd-3.com/archive/luserland.htm . The program is much easier because you only need the initial Logical Cluster Number (the first version of the code could extract all the LCN extents, but it would be useless, because you have to hash a file from first to last byte). Note that empty files (files with length 0) don't have any cluster allocated. The function returns -1
for the cluster (ERROR_HANDLE_EOF
). You can choose how to handle it.