How to step through a large ordered dictionary in Python 3.7?

Lately I've been refactoring some bash scripts into Python 3.7 as both a learning exercise and for real use in a project. The resulting implementation uses a very large ordered dictionary, say around 2 to 3 million entries. Storing the data this way has some significant advantages, reducing code complexity and processing time. However, there is one task that eludes me: how to step through the dictionary from a known starting point.

If I were doing this in C, I would make pointer to the desired start point and walk the pointer. If there is any analogous operation in Python, I don't know it and can't find it. All the techniques I found seem to duplicate the some/all of information into a new list, which would be time consuming and waste a lot of memory in my application. It also seems that you can't slice a dictionary, even though they are now ordered by default.

Consider this contrived example dictionary of the Latin alphabet, whose strangely keyed entries are grouped by vowels and consonants, and the entries within each group are sorted alphabetically:

dd = { #   key:  (  phonetic, letter, ascii, ebcedic, baudot, morse,  hollerith, strokes,  kind     )
    4296433290:  ( 'Alfa',     'A',    65,     193,     3,    '.-',     (12,1),     3,    'vowl'    ),
    5046716526:  ( 'Echo',     'E',    69,     197,     1,    '.',      (12,5),     4,    'vowl'    ),
    5000200584:  ( 'India',    'I',    73,     201,     6,    '..',     (12,9),     3,    'vowl'    ),
    5000971262:  ( 'Oscar',    'O',    79,     214,     24,   '---',    (11,6),     1,    'vowl'    ),
    5000921625:  ( 'Uniform',  'U',    85,     228,     7,    '..-',    (0,4),      1,    'vowl'    ),
    4297147083:  ( 'Yankee',   'Y',    89,     232,     21,   '-.--',   (0,8),      3,    'vowl'    ),
    4297256046:  ( 'Bravo',    'B',    66,     194,     25,   '-...',   (12,2),     3,    'cons'    ),
    4298140290:  ( 'Charlie',  'C',    67,     195,     14,   '-.-.',   (12,3),     1,    'cons'    ),
    5036185622:  ( 'Delta',    'D',    68,     196,     9,    '-..',    (12,4),     2,    'cons'    ),
    5036854221:  ( 'Foxtrot',  'F',    70,     198,     13,   '..-.',   (12,6),     3,    'cons'    ),
    5037458768:  ( 'Golf',     'G',    71,     199,     26,   '--.',    (12,7),     2,    'cons'    ),
    5035556903:  ( 'Hotel',    'H',    72,     200,     20,   '....',   (12,8),     3,    'cons'    ),
    5037119814:  ( 'Juliett',  'J',    74,     209,     11,   '.---',   (11,1),     2,    'cons'    ),
    5035556831:  ( 'Kilo',     'K',    75,     210,     15,   '-.-',    (11,2),     3,    'cons'    ),
    4296755665:  ( 'Lima',     'L',    76,     211,     18,   '.-..',   (11,3),     2,    'cons'    ),
    5035557110:  ( 'Mike',     'M',    77,     212,     28,   '--',     (11,4),     4,    'cons'    ),
    5037118125:  ( 'November', 'N',    78,     213,     12,   '-.',     (11,5),     3,    'cons'    ),
    5000423356:  ( 'Papa',     'P',    80,     215,     22,   '.--.',   (11,7),     2,    'cons'    ),
    5000923300:  ( 'Quebec',   'Q',    81,     216,     23,   '--.-',   (11,8),     2,    'cons'    ),
    5000969482:  ( 'Romeo',    'R',    82,     217,     10,   '.-.',    (11,9),     3,    'cons'    ),
    5035943840:  ( 'Sierra',   'S',    83,     226,     5,    '...',    (0,2),      1,    'cons'    ),
    5045251209:  ( 'Tango',    'T',    84,     227,     16,   '-',      (0,3),      2,    'cons'    ),
    5000168680:  ( 'Victor',   'V',    86,     229,     30,   '...-',   (0,5),      2,    'cons'    ),
    4296684445:  ( 'Whiskey',  'W',    87,     230,     19,   '.--',    (0,6),      4,    'cons'    ),
    5000923277:  ( 'Xray',     'X',    88,     231,     29,   '-..-',   (0,7),      2,    'cons'    ),
    4296215569:  ( 'Zulu',     'Z',    90,     233,     17,   '--..',   (0,9),      3,    'cons'    ),
}

And let's say I want to perform some processing on the consonants. And since the processing takes a lot of time (think days), I would like to do it in chunks. In this case, let's say 4 consonants at a time. I know ahead of time the keys for the beginning of a group, for example:

vowlbeg = 4296433290 # key of first vowel
consbeg = 4297256046 # key of first consonant

But I can't figure out how to take advantage of this foreknowledge. For example, to process the 8th through 11th consonant, the best I can do is:

beg = 8 # begin processing with 8th consonant
end = 12 # end processing with 11th consonant
kind = 'cons' # desired group
i=-1
for d in dd.items():
    if d[1][-1] is not kind: continue
    i += 1
    if i < beg: continue
    if i >= end: break
    print('processing:', i, d)

Which gives the desired results, albeit a bit slowly because I'm walking through the whole dictionary, from the beginning, until I encounter the desired entries.

processing: 8 (5035556831, ('Kilo', 'K', 75, 210, 15, '-.-', (11, 2), 3, 'cons'))
processing: 9 (4296755665, ('Lima', 'L', 76, 211, 18, '.-..', (11, 3), 2, 'cons'))
processing: 10 (5035557110, ('Mike', 'M', 77, 212, 28, '--', (11, 4), 4, 'cons'))
processing: 11 (5037118125, ('November', 'N', 78, 213, 12, '-.', (11, 5), 3, 'cons'))

I think I can express this loop more compactly using list or maybe dictionary comprehensions, but it seems that would create a huge duplicate in memory. Maybe the method above does that, too, I'm not 100% sure.

Things I know about my ordered dictionary

the groups, e.g., vowels and consonants, are indeed grouped and not scattered.
within each group, the entries are sorted in a known, desired order,
the beginning key of each group

Q: Is there a better way to do this? My backup plan is to just bite the bullet and keep a duplicate set of tuples, one per group, in order to be able to slice it. But that will essentially double my memory, as best I understand it.

Note: It's not evident from this silly example, but being able to access entries by keys in a single dictionary is a HUGE advantage in my application.

Solution

For a simple solution using the Python built-ins, you can create a list of keys and then start from any point in the list at the expense of some memory usage for materializing the list. See below for an interactive session demonstrating the point.

It should be easy to do a loop over any range of keys using this technique.

1> data = {id: (id, "a") for id in range(10)}

2> data
{0: (0, 'a'), 1: (1, 'a'), 2: (2, 'a'), 3: (3, 'a'), 4: (4, 'a'), 5: (5, 'a'), 6: (6, 'a'), 7: (7, 'a'), 8: (8, 'a'), 9: (9, 'a')}

3> data.keys()
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

4> data.keys()[5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dict_keys' object does not support indexing

5> keys = list(data.keys())

6> keys[5]
5

7> data[keys[5]]
(5, 'a')

Step 1: create some sample data similar to yours
Step 2: Demonstrate the structure
Step 3: Get the dict_keys for the structure
Step 4: Demonstrate that you can't jump to a specific point in the list in the dict_keys native form
Step 5: Materialise the dict_keys as an actual list structure
Step 6: Demonstrate getting a key from anywhere in the list
Step 7: Pull the data from the dict using an arbitrary key