Search code examples
haskellunicodepattern-matchingbytestring

Haskell unicode pattern matching


I am about to start developing an application in Haskell that requires some Unicode support.

How to perform Unicode pattern matching in Haskell? I saw the GHC's syntax extension. But is there any language level support to perform this (without needed GHC's special extension)?

I saw this question but the answer given there uses an extension-based approach. Also what is the best Haskell library to work with Unicode? Bytestring or Text? What are the advantages and disadvantages of both?


Solution

  • As far as I can tell, pattern matching on Unicode characters works out of the box. Try this:

    f ('薬':rest) = rest
    f _           = "Your string doesn't begin with 薬"
    
    main = do
      putStrLn (f "薬は絶対飲まへん!")
      putStrLn (f "なぜ?死にたいのか?")
    

    Regarding libraries, you definitely want Text rather than ByteString, the reason being that Text is actually meant for working with text, counting the length of strings by character rather than by byte and so on, whereas ByteString is just an immutable array of bytes with a few extra frills, more suitable for storing and transmitting binary data.

    As for pattern matching on ByteString, Text, etc., it's simply not possible without extensions since they are opaque types with deliberately hidden implementations. You can, however, pattern match on characters with many higher order functions that operate on on Text/ByteString:

    import Data.Text as T
    
    countTs n 't' = n+1
    countTs n 'T' = n+1
    countTs n _   = n
    
    main = do
      putStr "Please enter some text> "
      str <- T.pack `fmap` getLine
      let ts = T.foldl countTs 0 str
      putStrLn ("Your text contains " ++ show ts ++ " letters t!")
    

    I wouldn't worry about using extensions if I were you though. GHC is simply the Haskell compiler, so it's highly unlikely that you'll ever need to compile your code using anything else.