Search code examples
whitespacepyparsing

pyparsing: match words on same line


In pyparsing I'm looking for a simple way to match words (or other expressions) that occur on the same line, i.e. without any newline in between them.


Solution

  • You can override the default whitespace-skipping characters for a particular parser element - in this case, the word_on_the_same_line only skips spaces, but not newlines.

    import pyparsing as pp
    
    word = pp.Word(pp.alphas, pp.alphanums)
    
    # define special whitespace skipping, so that newlines aren't 
    # skipped when matching a word_on_the_same_line
    word_on_the_same_line = word().setWhitespaceChars(" ")
    
    # compare results with this version of word_on_the_same_line to see 
    # how pyparsing treats newlines as skippable whitespace
    # word_on_the_same_line = word()
    
    line = pp.Group(word("key") + word_on_the_same_line[...]("values"))
    
    test = """\
    key1 lsdkjf lskdjf lskjdf sldkjf
    key2 sdlkjf lskdj lkjss lsdj
    """
    
    print(line[...].parseString(test).dump())
    

    Prints:

    [['key1', 'lsdkjf', 'lskdjf', 'lskjdf', 'sldkjf'], ['key2', 'sdlkjf', 'lskdj', 'lkjss', 'lsdj']]
    [0]:
      ['key1', 'lsdkjf', 'lskdjf', 'lskjdf', 'sldkjf']
      - key: 'key1'
      - values: ['lsdkjf', 'lskdjf', 'lskjdf', 'sldkjf']
    [1]:
      ['key2', 'sdlkjf', 'lskdj', 'lkjss', 'lsdj']
      - key: 'key2'
      - values: ['sdlkjf', 'lskdj', 'lkjss', 'lsdj']