Search code examples
pythonpyparsing

building a dictionary from a string containing one or more tokens


given

import pyparsing as pp

lines = '''\
(xcoord -23899.747)
(ycoord 14349.544)
(elev 23899)
(region "mountainous")
(rate multiple)'''

leftParen    = pp.Literal('(')
rightParen   = pp.Literal(')')
doublequote  = pp.Literal('"')
v_string = pp.Word(pp.alphanums)
v_quoted_string = pp.Combine( doublequote + v_string + doublequote)
v_number = pp.Word(pp.nums+'.'+'-')

keyy = v_string
valu = v_string | v_quoted_string | v_number

item  = pp.Group( pp.Literal('(').suppress() + keyy + valu + pp.Literal(')').suppress() 
items = pp.ZeroOrMore( item)
dicct = pp.Dict( items)

pp.ParserElement.setDefaultWhitespaceChars('\r\n\t ')
print "item yields: " ,   item.parseString( lines).dump()
print "items yields: " , items.parseString( lines).dump()
print "dicct yields: ",  dicct.parseString( lines).dump()

gives

item yields: [['xcoord', '-23899.747']]
[0]:['xcoord', '-23899.747']
items yields: [['xcoord', '-23899.747']]
[0]:['xcoord', '-23899.747']
dicct yields: [['xcoord', '-23899.747']]
[0]:['xcoord', '-23899.747']

Hm. I'd expect to see five items within dicct. My use of Dict, ZeroOrMore and Group seem consistant with other examples on the net. It seems like only the first item gets matched. What am I doing wrong?

TIA,

code-warrior


Solution

  • This is easier to do than you might think. (It just takes weeks of practice for some of us.)

    • v_number, to represent numeric values, and v_string to represent unquoted string values are fairly straightforward.
    • I've used Combine with quoted strings so that the quotation marks are included with the strings in the parsed results.
    • I've used Group with key and value so that these values are paired in the output from the parser.
    • ZeroOrMore is there to allow for any number of key-value pairs, including zero.

    lines = '''\
    (xcoord -23899.747)
    (ycoord 14349.544)
    (elev 23899)
    (region "mountainous")
    (rate multiple)'''
    
    
    import pyparsing as pp
    key = pp.Word(pp.alphas)
    v_number = pp.Word(pp.nums+'.'+'-')
    v_string = pp.Word(pp.alphas)
    v_quoted_string = pp.Combine(pp.Literal('"') + v_string + pp.Literal('"') )
    value = v_number | v_string | v_quoted_string 
    item = pp.Literal('(').suppress() + pp.Group(key + value) + pp.Literal(')').suppress() 
    collection = pp.ZeroOrMore(item)
    
    result = {}
    for item in collection.parseString(lines):
        result[item[0]] = item[1]
    
    for key in result:
        print (key, result[key])
    

    Output:

    xcoord -23899.747
    ycoord 14349.544
    elev 23899
    region "mountainous"
    rate multiple