Search code examples
pythondictionarystring-parsing

How to quickly parse out prefix to this string in dictionary key?


I have a dictionary in Python:

dict1 = {'first': 'ABCDE', 'second': 12345, 'third': KITTY , 'four': dogcatbirdelephant, ...}

To be clear, I'm parsing data and throwing into a dictionary in Python.

My problem: sometimes the values for third have a prefix to them. Instead of values KITTY or CAT, I have A:KITTY or K:CAT. The prefix could be any letter, and there's always a colon separating the value I want (e.g. KITTY) from the prefix I don't (A:)

However, not all values are like this. Some are actually strings with no prefix.

How could one parse these dictionary values should that I save "everything that comes after the colon"? Would one check with a for statement? (I would prefer to avoid this, as there will be a substantial performance hit I think.)


Solution

  • @PatrickHaugh's answer is correct. You'll probably want to do a bit of filtering, since your example list has an integer as well as strings.

    Your question says "I'm parsing data and throwing into a dictionary", so I'm assuming they are coming from somewhere in a two-tuple, rather than from another dictionary.

    If you already have the data in a dictionary, then you are going to have to loop over the keys.

    #!/usr/bin/env python
    
    class Kitty(object):
        def __init__(self):
            self.d = {}
    
        def meow(self, k, v):
            """check for integers before adding to dictionary"""
            try:
                int(v)
                self.d[k] = v
            except ValueError:
                self.d[k] = v.split(":")[-1]
    
    if __name__ == "__main__":
        kitty = Kitty()
        kitty.meow("first", 12345)
        kitty.meow("second", "A:KITTY")
        kitty.meow("third", "B:KITTY")
        kitty.meow("fourth", "C:KITTY")
        kitty.meow("fifth", "KITTY")
        kitty.meow("sixty", "kreplach")
    
        print(kitty.d)
    

    This results in:

    {'third': 'KITTY', 'second': 'KITTY', 'fourth': 'KITTY', 'sixty': 'kreplach', 'fifth': 'KITTY', 'first': 12345}
    

    As far as "efficient", that's another question. Python's string methods are pretty danged efficient, how you feed the data to your function is your decision.