Search code examples
vbawinapiunicode-normalization

VBA String Normalization (via WinAPI)


I'm new to attempting to write code in VBA to use WinAPI functions. What encoding does the WinAPI Normalize() function work with? UTF-16 is what I would expect, but the following does not work. The number of characters seems like it's not calculated right, and then the attempt to actually create a normalized string will just crash Access.

'normFormEnum
'not random numbers, but from ...
'https://msdn.microsoft.com/en-us/library/windows/desktop/dd319094(v=vs.85).aspx
'for use in calling the Win API Function NormalizeString()
Public Enum normFormEnum
    normFOther = 0
    normFC = 1      'the W3C (Internet) required normalization format
    normFD = 2
    normFKC = 5
    normFKD = 6
End Enum

'https://msdn.microsoft.com/en-us/library/windows/desktop/dd319093(v=vs.85).aspx
Private Declare Function NormalizeString Lib "Normaliz" ( _
    ByVal normForm As normFormEnum, _
    ByVal lpSrcString As LongPtr, _
    ByVal cwSrcLength As Long, _
    ByRef lpDstString As LongPtr, _
    ByVal cwDstLength As Long _
    ) As Long

Public Function stringNormalize( _
    ByVal theString As String, _
    Optional ByVal normForm As normFormEnum = normFC _
    ) As String

    Dim nChars As Long
    Dim newString As String

    nChars = NormalizeString(normForm, StrPtr(theString), Len(theString), 0&, 0)

    'prefill the string buffer so it can be altered shortly...
    newString = String(nChars, " ")

Debug.Print nChars
'prints nChars, showing that it 3x the amount of characters.

'The following will crash the application....

'    NormalizeString normForm, StrPtr(theString), Len(theString), StrPtr(newString), nChars

    stringNormalize = newString

End Function

Solution

  • The function NormalizeString returns an estimated size in bytes when cwDstLength is 0, but you are using it as the number of characters.

    So take half the result from the first call and truncate the buffer with the result from the second call:

    Private Declare PtrSafe Function NormalizeString Lib "Normaliz" ( _
      ByVal normForm As Long, _
      ByVal lpSrcString As LongPtr, _
      ByVal cwSrcLength As Long, _
      ByVal lpDstString As LongPtr, _
      ByVal cwDstLength As Long _
    ) As Long
    
    Public Enum NormalizationForm
      NormOther = 0
      NormC = 1
      NormD = 2
      NormKC = 5
      NormKD = 6
    End Enum
    
    Public Function NormalizeStr(source As String, ByVal normForm As NormalizationForm) As String
      Dim buffer As String, size As Long, i As Long
    
      For i = 1 To 5
        size = NormalizeString(normForm, StrPtr(source), Len(source), StrPtr(buffer), Len(buffer))
    
        If size >= 0 And size < Len(buffer) Then
          NormalizeStr = Left$(buffer, size)
          Exit Function
        End If
    
        buffer = String$(Abs(size) + 1, 0)
      Next
    
      Err.Raise 9, , "NormalizeString failed"
    End Function
    
    Public Sub Usage()
      Debug.Print NormalizeStr(ChrW(196), NormD)
      Debug.Print NormalizeStr("A" & ChrW(776), NormC)
    End Sub