Search code examples
pythonnumpysparse-matrix

Find n greatest numbers in a sparse matrix


I am using sparse matrices as a mean of compressing data, with loss of course, what I do is I create a sparse dictionary from all the values greater than a specified treshold. I'd want my compressed data size to be a variable which my user can choose.

My problem is, I have a sparse matrix with alot of near-zero values, and what I must do is choose a treshold so that my sparse dictionary is of a specific size (or eventually that the reconstruction error is of a specific rate) Here's how I create my dictionary (taken from stackoverflow I think >.< ):

n = abs(smat) > treshold #smat is flattened(1D)
i = mega_range[n] #mega range is numpy.arange(smat.shape[0])
v = smat[n]
sparse_dict = dict(izip(i,v))

How can I find treshold so that it is equal to the nth greatest value of my array (smat)?


Solution

  • scipy.stats.scoreatpercentile(arr,per) returns the value at a given percentile:

    import scipy.stats as ss
    print(ss.scoreatpercentile([1, 4, 2, 3], 75))
    # 3.25
    

    The value is interpolated if the desired percentile lies between two points in arr.

    So if you set per=(len(smat)-n)/len(smat) then

    threshold = ss.scoreatpercentile(abs(smat), per)
    

    should give you (close to) the nth greatest value of the array smat.