Search code examples
parsinghaskellparsecmegaparsec

How to make a sub parser with Parsec?


I would like to parse several lists of commands indented or formated as array with Parsec. As example, my lists will be formated like this:

Command1 arg1 arg2       Command1 arg1 arg2         Command1 arg1 arg2
Command2 arg1                                       Command3 arg1 arg2 arg3
                         Command3 arg1 arg2 arg3
                                                    Command4
Command3 arg1 arg2 arg3  Command2 arg1
                         Command4
Command4
Command5 arg1                                       Command2 arg1

These commands are supposed to be parsed column by column with state changes in the parser.

My idea is to gather the commands into separated list of string and parse these strings into a subparser (executed inside the main parser).

I inspected the API of the Parsec library but I didn't find a function to do that.

I considered using runParser but this function only extract the results of the parser and not its state.

I also considered making a function inspired by runParsecT and mkPT to make my own parser, but the constructors ParsecT or initialPos are not available (not exported by the library)

Is it possible to run a subparser inside a parser with Parsec?

If not, does a library such as megaparsec can solve my problem?


Solution

  • Not a complete answer, more a question for clarification:

    Is it necessary to build a list of strings? I would prefer to parse the input and convert it into a more special datatype. By that you can use the type guarantees of haskell.

    I would begin by defining a datatype for my commands:

    data Command = Command1 Argtype1 
                   | Command2 Argtype2
                   | Command3 Argtype1 Argtype2
    
    data Argtype1 = Arg1 | Arg2 | ArgX
    data Argtype2 = Arg2_1 | Arg2_2 
    

    After that you can parse the input and put it in datatypes.

    At the end of the parsing you can mappend the results (that is for lists adding at the front with operation (:)).

    You end up with a datatype of [Command]. With that you can work further.

    For parsing the text you can follow the introduction to the package megaparsec at (https://markkarpov.com/megaparsec/parsing-simple-imperative-language.html)


    Or do you mean something completly different? Perhaps that every line (containing some commands) is as it whole shall be one input of a state machine and the state machine changes in relation to the commands? Then I wonder why the state machine shall be implemented as a parser.