Search code examples
pythonnumpydictionarypostal-code

Faster searching in Python - Postcodes


I have been working on a no-sql solution to naming a list of N postcodes using a national list of postcodes. So far I have my reference dictionary for the state of NSW in the form :

{'Belowra': 2545, 'Yambulla': 2550, 'Bingie': 2537, ... [n=4700]

My function uses this to look up the names of a postcode:

def look_up_sub(pc, settings):
    output=[]
    for suburb, postcode in postcode_dict.items():
        if postcode == pc and settings=='random':#select match at random
            print(suburb)                        #remove later
            output.append(suburb)
            break                                #stop searching for matches
        elif postcode == pc and settings=='all': #print all possible names for postcode
            print(suburb)                        #remove later
    return output 

N=[2000,2020,2120,2019]
for i in N:
    look_up_sub(i, 'random')

>>>Millers Point
>>>Mascot
>>>Westleigh
>>>Banksmeadow

While ok for small lists, when N is sufficiently large this inefficient approach is very slow. I have been thinking about how I could use numpy arrays to speed this up considerably and am looking for faster ways to approach this.


Solution

  • Your data structure is backwards, it should go from postcode:suburb and then when you pass it a pc you get a list of suburbs back, then either select from that list randomly or print all of them in the list. Here is what you should do, first, reverse your dict:

    import defaultdict
    post_to_burb = defaultdict(list)
    for suburb, postcode in postcode_dict.items():
        post_to_burb[postcode].append(suburb)
    

    Now, your function should do something like:

    import random
    def look_up_sub(pc, settings):
        output = []
        if settings == "random":
            output.append(random.choice(post_to_burb[pc]))
        elif settings == 'all':
            output.extend(post_to_burb[pc])
        return output
    

    Using numpy here would be unweildy, especially since you are working with strings. You might get some marginal imporvemnt in runtime, but your overall algorithm will still be linear time. Now it is constant time, once you've set up your post_to_burb dict.