I wanted to replace sed
and awk
with Parsec. For example, extract number from strings like unknown structure but containing the number 42 and maybe some other stuff
.
I run into "unexpected end of input". I'm looking for equivalent of non-greedy .*([0-9]+).*
.
module Main where
import Text.Parsec
parser :: Parsec String () Int
parser = do
_ <- many anyToken
x <- read <$> many1 digit
_ <- many anyToken
return x
main :: IO ()
main = interact (show . parse parser "STDIN")
This can be easily done with my library regex-applicative. It gives you both the combinator interface and the features of regular expressions that you seem to want.
Here's a working version that's closest to your example:
{-# LANGUAGE ApplicativeDo #-}
import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)
parser :: RE Char Int
parser = do
_ <- few anySym
x <- decimal
_ <- many anySym
return x
main :: IO ()
main = interact (show . match parser)
Here's an even shorter version, using findFirstInfix
:
import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)
main :: IO ()
main = interact (snd3 . findFirstInfix decimal)
where snd3 (_, r, _) = r
If you want to perform actual tokenization (e.g. skip 93
in foo93bar
), then take a look at lexer-applicative, a tokenizer based on regex-applicative.