Search code examples
pythoncollectionsbinary-search-treedefaultdict

Implementing binary search tree for defaultdict type of collection in python


I have googled a code for searching a key in python defaultdict using binary search. I have implemented it as

import collections
import bisect

class MyDict(collections.Mapping):
  def __init__(self, contents):
    "contents must be a sequence of key/value pairs"
    self._list = sorted(contents)
  def __iter__(self):
    return (k for (k, _) in self._list)
  def __contains__(self, k):
    i = bisect.bisect_left(self._list, (k, None))
    return i < len(self._list) and self._list[i][0] == k
  def __len__(self):
    return len(self._list)
  def __getitem__(self, k):
    i = bisect.bisect_left(self._list, (k, None))
    if i >= len(self._list): raise KeyError(k)
    return self._list[i][1]


mydictionary = {'60': set(['US0596951063']), '61': set(['']), '63': set(['US3385171059']), '65': set(['US0936981085']), '67': set(['']), '69': set(['']), '24': set(['', 'US0191181082']), '25': set(['']), '26': set(['US0301371037']), '27': set(['US0295951059']), '20': set(['', 'US0320157037', 'US0320151097']), '21': set(['']), '22': set(['']), '23': set(['']), '29': set(['']), '2': set(['']), '6': set(['']), '8': set(['US0077371096']), '99': set(['']), '98': set(['']), '91': set(['US1104481072']), '15': set(['']), '93': set(['']), '92': set(['US0976981045']), '94': set(['', 'US1025651084']), '97': set(['US0885771016']), '96': set(['', 'US0557101070']), '11': set(['']), '10': set(['']), '13': set(['']), '12': set(['US9099121074']), '59': set(['US09247Q1067']), '58': set(['US0576652004']), '17': set(['']), '16': set(['']), '19': set(['IL0006320183']), '54': set(['US0405151087']), '51': set(['']), '50': set(['US05343P1093']), '53': set(['']), '18': set(['US0080153077']), '90': set(['US0682211000']), '88': set(['', 'US0826571079']), '82': set(['US0582641025']), '80': set(['US0571491069']), '81': set(['US0928281023']), '86': set(['', 'US1030431050']), '87': set(['US05564T1034']), '84': set(['US0565251081']), '46': set(['US0303711081']), '47': set(['', 'US00753P1030', 'US00163U1060']), '45': set(['US22149T1025', 'US2274781044']), '42': set(['US2947007033']), '40': set(['US61747Y6914']), '41': set(['', 'US8814482035']), '5': set(['US0245914066', 'US0245911096']), '7': set(['US0048161048']), '9': set(['US0063513081']), '76': set(['US0905722072']), '74': set(['LR000A0D81P7']), '72': set(['US0668491006']), '71': set(['CA08135F1071']), '102': set(['US1484111018']), '100': set(['US13971R3066']), '101': set(['US1330341082']), '79': set(['']), '78': set(['']), '33': set(['', 'US00215F1075', 'US0490792050']), '32': set(['', 'US0373471012']), '31': set(['']), '30': set([''])}

d = MyDict(mydictionary)
r = d.__getitem__('84')
print r

i'm newbie to python and algorithms, can anyone help me please.

I am getting this below error

Traceback (most recent call last):
  File "search.py", line 24, in <module>
    i = d.__getitem__('84')
  File "search.py", line 17, in __getitem__
    if i >= len(self._list): raise KeyError(k)
KeyError: '84'

Solution

  • The problem is that when you pass a dictionary to your __init__:

    self._list = sorted(contents)
    

    Then self._list is a list of the keys of the dict:

    In [5]: my_dict
    Out[5]: {'a': 1, 'b': 2}
    
    In [6]: list(my_dict)
    Out[6]: ['a', 'b']
    
    In [7]: sorted(my_dict)
    Out[7]: ['a', 'b']
    

    So now your self._list attribute is a list of strings, but then in __getitem__:

     i = bisect.bisect_left(self._list, (k, None))
    

    You try to bisect it with a tuple, ('84', None') forcing a comparison between a string and a tuple, but in Python 2, unfortunately, this is allowed and always evaluates a string as less than a tuple.

    From the Python 2 docs

    CPython implementation detail: Objects of different types except numbers are ordered by their type names; objects of the same types that don’t support proper comparison are ordered by their address.

    Thus, consider:

    In [7]: sorted(my_dict)
    Out[7]: ['a', 'b']
    
    In [8]: import bisect
    
    In [9]: _list = sorted(my_dict)
    
    In [10]: bisect.bisect_left(_list, ('a', None))
    Out[11]: 2
    

    Since your tuple automatically evaluates to greater than everything in your list, it returns len(self._list). Thus triggering your condition to raise KeyError.

    To get a sequence of key/value pairs from a dictionary, use:

    mydictionary.items()
    

    Note, in Python 3, order comparisons between different types is not generally allowed. Instead, a TypeError is thrown, which would have been more helpful:

    <ipython-input-1-8101e6bd7098> in __getitem__(self, k)
         14     return len(self._list)
         15   def __getitem__(self, k):
    ---> 16     i = bisect.bisect_left(self._list, (k, None))
         17     if i >= len(self._list): raise KeyError(k)
         18     return self._list[i][1]
    
    TypeError: '<' not supported between instances of 'str' and 'tuple'