Search code examples
c++vb.netwindowsperformancentfs-mft

Why file enumeration using DeviceIoControl is faster in VB.NET than in C++?


I am trying to read windows Master File Table (MFT) for fast enumeration of files. Till now I have seen two approaches to do this:

  1. As suggested by Jeffrey Cooperstein and Jeffrey Richter using DeviceIoControl
  2. Direct parsing of MFT as presented in some opensource tools and An NTFS Parser Lib

For my project I am focusing on the approach [1]. The problem I am facing is mostly related to execution time. Just to be clear, following is my system and development enviornment:

  1. IDE - Visual Studio 2013
  2. Language - C++
  3. OS - Windows 7 Professional x64
  4. 32 Bit binaries are generated for C++ and .NET code.

Problem

I have compared the version mentioned in [1] (slightly modified) with a VB.NET implementation available on codeplex. The issue is if I uncomment the statement in Inner Loop the C++ code execution time increases by a factor of 7-8x. I haven't implemented the path matching in C++ code (which is available in the VB code).

Q1. Kindly suggest how to improve the performance of the C++ code.

Timings for enumerating C:\ drive on my machine:

  1. C++ (with uncommented statement in inner loop) - 21 seconds
  2. VB.NET (with additional path matching code) - 3.5 seconds

For more clarity following is the C++ and VB.NET snippets.

C++

bool FindAll()
{
    if (m_hDrive == NULL) // Handle of, for example, "\\.\C:"
        return false;

    USN_JOURNAL_DATA ujd = {0};
    DWORD cb = 0;
    BOOL bRet = FALSE;
    MFT_ENUM_DATA med = {0};

    BYTE pData[sizeof(DWORDLONG) + 0x10000] = {0};

    bRet = DeviceIoControl(m_hDrive, FSCTL_QUERY_USN_JOURNAL, NULL, 0, &ujd, sizeof(USN_JOURNAL_DATA), &cb, NULL);
    if (bRet == FALSE) return false;

    med.StartFileReferenceNumber = 0;
    med.LowUsn = 0;
    med.HighUsn = ujd.NextUsn;

    //Outer Loop
    while (TRUE)
    {
        bRet = DeviceIoControl(m_hDrive, FSCTL_ENUM_USN_DATA, &med, sizeof(med), pData, sizeof(pData), &cb, NULL);
        if (bRet == FALSE) {
            break;
        }

        PUSN_RECORD pRecord = (PUSN_RECORD)&pData[sizeof(USN)];

        //Inner Loop
        while ((PBYTE)pRecord < (pData + cb))
        {
            tstring sz((LPCWSTR) ((PBYTE)pRecord + pRecord->FileNameOffset), pRecord->FileNameLength / sizeof(WCHAR));

            bool isFile = ((pRecord->FileAttributes & FILE_ATTRIBUTE_DIRECTORY) != FILE_ATTRIBUTE_DIRECTORY);
            if (isFile) m_dwFiles++;
            //m_nodes[pRecord->FileReferenceNumber] = new CNode(pRecord->ParentFileReferenceNumber, sz, isFile);

            pRecord = (PUSN_RECORD)((PBYTE)pRecord + pRecord->RecordLength);
        }
        med.StartFileReferenceNumber = *(DWORDLONG *)pData;
    }
    return true;
}

Where m_nodes is defined as typedef std::map<DWORDLONG, CNode*> NodeMap;

VB.NET

Public Sub FindAllFiles(ByVal szDriveLetter As String, fFileFound As FileFound_Delegate, fProgress As Progress_Delegate, fMatch As IsMatch_Delegate)

        Dim usnRecord As USN_RECORD
        Dim mft As MFT_ENUM_DATA
        Dim dwRetBytes As Integer
        Dim cb As Integer
        Dim dicFRNLookup As New Dictionary(Of Long, FSNode)
        Dim bIsFile As Boolean

        ' This shouldn't be called more than once.
        If m_Buffer.ToInt32 <> 0 Then
            Console.WriteLine("invalid buffer")
            Exit Sub
        End If

        ' progress 
        If Not IsNothing(fProgress) Then fProgress.Invoke("Building file list.")

        ' Assign buffer size
        m_BufferSize = 65536 '64KB

        ' Allocate a buffer to use for reading records.
        m_Buffer = Marshal.AllocHGlobal(m_BufferSize)

        ' correct path
        szDriveLetter = szDriveLetter.TrimEnd("\"c)

        ' Open the volume handle 
        m_hCJ = OpenVolume(szDriveLetter)

        ' Check if the volume handle is valid.
        If m_hCJ = INVALID_HANDLE_VALUE Then
            Console.WriteLine("Couldn't open handle to the volume.")
            Cleanup()
            Exit Sub
        End If

        mft.StartFileReferenceNumber = 0
        mft.LowUsn = 0
        mft.HighUsn = Long.MaxValue

        Do
            If DeviceIoControl(m_hCJ, FSCTL_ENUM_USN_DATA, mft, Marshal.SizeOf(mft), m_Buffer, m_BufferSize, dwRetBytes, IntPtr.Zero) Then
                cb = dwRetBytes
                ' Pointer to the first record
                Dim pUsnRecord As New IntPtr(m_Buffer.ToInt32() + 8)

                While (dwRetBytes > 8)
                    ' Copy pointer to USN_RECORD structure.
                    usnRecord = Marshal.PtrToStructure(pUsnRecord, usnRecord.GetType)

                    ' The filename within the USN_RECORD.
                    Dim FileName As String = Marshal.PtrToStringUni(New IntPtr(pUsnRecord.ToInt32() + usnRecord.FileNameOffset), usnRecord.FileNameLength / 2)

                    'If Not FileName.StartsWith("$") Then
                    ' use a delegate to determine if this file even matches our criteria
                    Dim bIsMatch As Boolean = True
                    If Not IsNothing(fMatch) Then fMatch.Invoke(FileName, usnRecord.FileAttributes, bIsMatch)

                    If bIsMatch Then
                        bIsFile = Not usnRecord.FileAttributes.HasFlag(FileAttribute.Directory)
                        dicFRNLookup.Add(usnRecord.FileReferenceNumber, New FSNode(usnRecord.FileReferenceNumber, usnRecord.ParentFileReferenceNumber, FileName, bIsFile))
                    End If
                    'End If

                    ' Pointer to the next record in the buffer.
                    pUsnRecord = New IntPtr(pUsnRecord.ToInt32() + usnRecord.RecordLength)

                    dwRetBytes -= usnRecord.RecordLength
                End While

                ' The first 8 bytes is always the start of the next USN.
                mft.StartFileReferenceNumber = Marshal.ReadInt64(m_Buffer, 0)

            Else

                Exit Do

            End If

        Loop Until cb <= 8

        If Not IsNothing(fProgress) Then fProgress.Invoke("Parsing file names.")

        ' Resolve all paths for Files
        For Each oFSNode As FSNode In dicFRNLookup.Values.Where(Function(o) o.IsFile)
            Dim sFullPath As String = oFSNode.FileName
            Dim oParentFSNode As FSNode = oFSNode

            While dicFRNLookup.TryGetValue(oParentFSNode.ParentFRN, oParentFSNode)
                sFullPath = String.Concat(oParentFSNode.FileName, "\", sFullPath)
            End While
            sFullPath = String.Concat(szDriveLetter, "\", sFullPath)

            If Not IsNothing(fFileFound) Then fFileFound.Invoke(sFullPath, 0)
        Next

        '// cleanup
        Cleanup() '//Closes all the handles
        If Not IsNothing(fProgress) Then fProgress.Invoke("Complete.")
    End Sub

Where fFileFound is defined as follows:

Sub(s, l)
    If s.ToLower.StartsWith(sSearchPath) Then
        lCount += 1
        lstFileNames.Add(s.ToLower) '// Dim lstFileNames As List(Of String)
    End If
End Sub

Where FSNode & CNode has the following structure:

//C++ version
class CNode
{
public:
    //DWORDLONG m_dwFRN;
    DWORDLONG m_dwParentFRN;
    tstring m_sFileName;
    bool m_bIsFile;

public:
    CNode(DWORDLONG dwParentFRN, tstring sFileName, bool bIsFile = false) : 
        m_dwParentFRN(dwParentFRN), m_sFileName(sFileName), m_bIsFile(bIsFile){
    }
    ~CNode(){
    }
};

Note - The VB.NET code spawns a new thread (needed as it has GUI), whereas, I am calling the c++ function in the main thread (a simple console application for testing).


Update

It was a silly mistake from my side. The DeviceIoControl API is working as expected. Though the Debug build is a bit slower than the Release build. Refer to the following article:

how-can-i-increase-the-performance-in-a-map-lookup-with-key-type-stdstring


Solution

  • I didn't run your code, but since you say the commented line is the issue, the problem is probably the map insertion. In the C++ code, you are using a std::map, which is implemented as a tree (sorted by key, log(n) access time). In the VB code, you are using a Dictionary, which is implemented as a hash table (no sorting, constant access time). Try using a std::unordered_map in the C++ version.