Search code examples
pythondictionarytokenizestring-parsing

Python - tokenizing, replacing words


I'm trying to create something like sentences with random words put into them. To be specific, I'd have something like:

"The weather today is [weather_state]."

and to be able to do something like finding all tokens in [brackets] and than exchange them for a randomized counterpart from a dictionary or a list, leaving me with:

"The weather today is warm."
"The weather today is bad."

or

"The weather today is mildly suiting for my old bones."

Keep in mind, that the position of the [bracket] token wouldn't be always in the same position and there would be multiple bracketed tokens in my string, like:

"[person] is feeling really [how] today, so he's not going [where]."

I really don't know where to start with this or is this even the best solution to use tokenize or token modules with this. Any hints that would point me in the right direction greatly appreciated!

EDIT: Just for clarification, I don't really need to use square brackets, any non-standard character will do.


Solution

  • You're looking for re.sub with a callback function:

    words = {
        'person': ['you', 'me'],
        'how': ['fine', 'stupid'],
        'where': ['away', 'out']
    }
    
    import re, random
    
    def random_str(m):
        return random.choice(words[m.group(1)])
    
    
    text = "[person] is feeling really [how] today, so he's not going [where]."
    print re.sub(r'\[(.+?)\]', random_str, text)
    
    #me is feeling really stupid today, so he's not going away.   
    

    Note that unlike the format method, this allows for more sophisticated processing of placeholders, e.g.

    [person:upper] got $[amount if amount else 0] etc
    

    Basically, you can build your own "templating engine" on top of that.