I'm trying to use Idris 2's Text.Parser
library to parse a pre-tokenized byte stream. I wrote the following utility function in the style of Parsec's expression parser:
module Text.Parser.Expression
import Text.Parser
public export
data Assoc
= AssocNone
| AssocLeft
| AssocRight
public export
data Op state k a
= Prefix (Grammar state k True (a -> a))
| Infix (Grammar state k True (a -> a -> a)) Assoc
public export
OpTable : Type -> Type -> Type -> Type
OpTable state k a = List (List (Op state k a))
public export
expressionParser :
OpTable state k a ->
Grammar state k True a ->
Grammar state k True a
expressionParser table term = foldl level term table
where
level : Grammar state k True a -> List (Op state k a) -> Grammar state k True a
level factor ops = choiceMap toP ops <|> factor
where
toP : Op state k a -> Grammar state k True a
toP (Infix op AssocNone) = do
x <- factor
f <- op
y <- factor
pure $ f x y
toP (Infix op AssocLeft) = do
x <- factor
fs <- some (flip <$> op <*> factor)
pure $ foldl (flip ($)) x fs
toP (Infix op AssocRight) = do
fs <- some (factor >>= \x => op <*> pure x)
y <- factor
pure $ foldr ($) y fs
toP (Prefix op) = op <*> factor
For certain inputs, this seems to scale really badly with the number of operator definitions. Here's a cut-down example:
public export
Number : Type
Number = Double
public export
data Fun
= IntFun
| Rnd
public export
data BinOp
= Eq
| NEq
| LT
| LE
| GT
| GE
| Plus
| Minus
| Mul
| And
| Or
public export
data Expr
= NumLitE Number
| Bin BinOp Expr Expr
| FunE Fun (List1 Expr)
public export
Show Fun where
show IntFun = "INT"
show Rnd = "RND"
public export
Show BinOp where
show Eq = "="
show NEq = "<>"
show LT = "<"
show LE = "<="
show GT = ">"
show GE = ">="
show Plus = "+"
show Minus = "-"
show Mul = "*"
show And = "AND"
show Or = "OR"
public export
Show Expr where
show (NumLitE n) = show n
show (Bin op x y) = unwords [show x, show op, show y]
show (FunE f args) = show f ++ "(" ++ show args ++ ")"
bits8 : Bits8 -> Grammar state Bits8 True ()
bits8 x = terminal ("Byte " ++ show x) $ \x' => if x == x' then Just () else Nothing
lexeme : {c : Bool} -> Grammar state Bits8 c a -> Grammar state Bits8 c a
lexeme p = afterMany (bits8 0x20) p
comma : Grammar state Bits8 True ()
comma = lexeme $ bits8 0x2c
parens : {c : Bool} -> Grammar state Bits8 c a -> Grammar state Bits8 True a
parens = between (lexeme $ bits8 0x28) (lexeme $ bits8 0x29)
digit : Grammar state Bits8 True Bits8
digit = terminal "digit" $ \x =>
toMaybe (0x30 <= x && x <= 0x39) x
digitLit : (Num a) => Grammar state Bits8 True a
digitLit = fromInteger . cast . (\x => x - 0x30) <$> digit
numLit : (Num a, Neg a) => Grammar state Bits8 True a
numLit {a} = fromDigits <$> lexeme sign <*> lexeme (some digitLit)
where
fromDigits : Bool -> List1 a -> a
fromDigits neg =
(if neg then negate else id) .
foldl (\x => \y => x * 10 + y) (the a 0)
sign : Grammar state Bits8 False Bool
sign = option False $ True <$ bits8 0xab
expr : Grammar state Bits8 True Expr
expr = expressionParser table term <|> fail "expression"
where
table : List (List (Op state Bits8 Expr))
table =
[ [ Infix (lexeme $ Bin Mul <$ bits8 0xac) AssocLeft
]
, [ Infix (lexeme $ Bin Plus <$ bits8 0xaa) AssocLeft
, Infix (lexeme $ Bin Minus <$ bits8 0xab) AssocLeft
]
, -- This next group is the one I will keep shrinking
[ Infix (lexeme $ Bin Eq <$ bits8 0xb2) AssocNone
, Infix (lexeme $ Bin NEq <$ (bits8 0xb3 *> bits8 0xb1)) AssocNone
, Infix (lexeme $ Bin GE <$ (bits8 0xb1 *> bits8 0xb2)) AssocNone
, Infix (lexeme $ Bin GT <$ bits8 0xb1) AssocNone
, Infix (lexeme $ Bin LE <$ (bits8 0xb3 *> bits8 0xb2)) AssocNone
, Infix (lexeme $ Bin LT <$ bits8 0xb3) AssocNone
]
, [ Infix (lexeme $ Bin And <$ bits8 0xaf) AssocLeft
, Infix (lexeme $ Bin Or <$ bits8 0xb0) AssocLeft
]
]
fun : Grammar state Bits8 True Fun
fun = lexeme $ choice
[ IntFun <$ bits8 0xb5
, Rnd <$ bits8 0xbb
]
term : Grammar state Bits8 True Expr
term =
NumLitE <$> numLit
<|> FunE <$> fun <*> parens (sepBy1 comma expr)
<|> parens expr
For measurement, I have tried parsing [181,40,40,187,40,49,41,172,51,41,170,49,41]
while removing the parsing rules for Eq
, NEq
, ..., Lt
. Here is the user time of parsing the above list of bytes with the number of rules not commented out in that parsing rule group:
n | usr (seconds) |
---|---|
1 | 0.41 |
2 | 1.56 |
3 | 4.67 |
4 | 13.92 |
5 | 25.71 |
6 | 45.92 |
What is going on here?
I fixed this by copying more of Parsec's design. As can be seen at that link, the important idea is to parse a leading term just once, and then parse a chain of associative operators and operands following it. This avoids repeated re-parsing of higher-precedence terms, which is what's causing the slowdown in the code in the question.