Search code examples
pythoncachingiolrufunctools

Finding Cache Miss, Hit ratios in an I/O trace file


I have an I/O trace file with the following fields ('asu', 'block_address','size', 'opcode','time_stamp'). The Data looks like this. (over 5 million rows)

0,20941264,8192,W,0.551706
0,20939840,8192,W,0.554041
0,20939808,8192,W,0.556202
1,3436288,15872,W,1.250720
1,3435888,512,W,1.609859
1,3435889,512,W,1.634761
0,7695360,4096,R,2.346628
1,10274472,4096,R,2.436645
2,30862016,4096,W,2 448003
2,30845544,4096,W,2.449733
1,10356592,4096,W,2.449733 

I am trying to add a cache layer in my project and want to calculate the misses and hits. I am using @functools.lru_cache(maxsize = None) to find cache hits and misses for the block_address. Following the tutorial I tried calculating the miss/hits. blk_trace is the trace array for block_address.

@functools.lru_cache(maxsize = None)
def blk_iter():
    blk_len = len(blk_trace)
    for i in range(0,blk_len):
        print(blk_trace[i])

On looking at the cache info blk_iter.cache_info() , I get CacheInfo(hits=0, misses=1, maxsize=None, currsize=1) . Which is not right. I am fairly new to python and caching concepts. I don't know what I am doing wrong. How do I find the miss/hits for the block address?


Solution

  • The cache is for the function blk_iter -- you only called blk_iter once, therefore your cache size is one, and it has one miss.

    Consider the following function with lru_cache

    @lru_cache(maxsize=None)
    def myfunc(x):
        print('Cache miss: ', x)
        return x + 1
    

    When called with a certain value for x the function will run and the result will be stored in the cache. If called again with the same parameter, the function will not run at all and the cached value will be returned.

    >>> for i in range(3):
    ...     print(myfunc(i))
    ...
    Cache miss:  0
    1
    Cache miss:  1
    2
    Cache miss:  2
    3
    >>> myfunc(0) # this will be a cache hit
    1
    >>> myfunc(3) # this will be another miss
    Cache miss:  3
    4
    >>> myfunc.cache_info()
    CacheInfo(hits=1, misses=4, maxsize=None, currsize=4)   
    

    In your example, even if the cache was setup correctly, you would have all misses and no hits anyhow for i in range(0,blk_len): will call with a new argument each iteration, therefore the cache will never hit.