Search code examples
sqlregexvbams-access

VBA Regex in MSAccess - finding a double quote character


Seemingly simple problem. I need to find those records where "long text" field1 in table notes contains lines starting with -" (i.e. hyphen and double quote) using VBA regular expressions in SQL. (The field1 is using rich text formatting (table and form field). I could drop it, if necessary, but in my context it is useful to have rich text).

TLDR;

Double quote is stored as its html entity " can be reverted by PLAINTEXT() function

enter image description here

To make things easier I start by simply trying to match fields containing any quote characters: I can find fields containing single quote without a problem:

SELECT *
FROM notes
WHERE RegexMatch(field1, ".*'.*")
;

but it fails with double-quote

SELECT *
FROM notes
WHERE RegexMatch(field1, ".*"".*")
;

Where I try to escape the quote character by another quote. What am I doing wrong? I have also pasted my RegexMatch() function in case this is relevant.

' ----------------------------------------------------------------------'
' Return True if the given string value matches the given Regex pattern '
' ----------------------------------------------------------------------'
Public Function RegexMatch(value As Variant, pattern As String) As Boolean
    If IsNull(value) Then Exit Function
    ' Using a static, we avoid re-creating the same regex object for every call '
    Static regex As Object
    ' Initialise the Regex object '
    If regex Is Nothing Then
        Set regex = CreateObject("vbscript.regexp")
        With regex
            .Global = True
            .IgnoreCase = True
            .Multiline = True
        End With
    End If
    ' Update the regex pattern if it has changed since last time we were called '
    If regex.pattern <> pattern Then regex.pattern = pattern
    ' Test the value against the pattern '
    RegexMatch = regex.Test(value)
End Function

Many thanks for your help

Query now works well:

SELECT *
FROM notes
WHERE RegexMatch(PLAINTEXT(notes.field1), "-""")
;

Solution

  • If the column uses Rich Text format, then " is internally stored as &quot;.

    Any query for that column will work with the HTML source text, not the formatted HTML.

    You can see the source text by creating a form based on the table, with a textbox for the field that has TextBox.TextFormat = Plain Text.


    Erik A makes a very good point in the comment:

    You can use the PlainText function to query the plain text representation of the rich text.

    Then you can use this to query for double quotes:

    WHERE PlainText(fld) LIKE "*""*"
    

    or

    WHERE PlainText(fld) LIKE '*"*'