Search code examples
regexhaskellasciibytestring

Matching 8-bit characters in Haskell Text.Regex.Posix.ByteString


my Haskell application reads input as a list of ByteString and I'm using Text.Regex.Posix.ByteString.regexec to find matches. Some input has a character code 253 (it's a 1/2 symbol in one IBM PC character set) and it seems that the pattern '.' (i.e., dot, "match any character") doesn't match it. Any way to make it match ?


Solution

  • This works for me on a Windows Haskell install:

    > length $ ((pack ['\1'..'\253']) =~ "." :: [[ByteString]])
    252
    

    I.e. dot matches all characters in range including code 253.

    Note that the library calls out to the underlying posix regex matcher, typically, I assume, from glibc.

    So I would imagine any issue you have would be with that precise underlying c implementation.

    Something like Text.Regex.TDFA.ByteString might give you clearer behavior in this case, since it is all in Haskell?