Search code examples
vectorparipari-gp

Most common term in a vector - PARI/GP


I feel like I'm being really stupid here as I would have thought there's a simple command already in Pari, or it should be a simple thing to write up, but I simply cannot figure this out.

Given a vector, say V, which will have duplicate entries, how can one determine what the most common entry is?

For example, say we have: V = [ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ]

I want something which would return the value 8.

I'm aware of things like vecsearch, but I can't see how that can be tweaked to make this work?


Very closely related to this, I want this result to return the most common non-zero entry, and some vectors I look at will have 0 as the most common entry. Eg: V = [ 0, 0, 0, 0, 3, 3, 5 ]. So whatever I execute here I would like to return 3. I tried writing up something which would remove all zero terms, but again struggled.

The thing I have tried in particular is:

rem( v ) = {
my( c );
while( c = vecsearch( v, 0 ); #c, v = vecextract( v, "^c" ) ); v
}

but vecextract doesn't seem to like this set up.


Solution

  • If you can ensure all the elements are within the some fixed range then it is enough just to do the counting sorting with PARI/GP code like this:

    counts_for(v: t_VEC, lower: t_INT, upper: t_INT) = {
        my(counts = vector(1+upper-lower));
    
        for(i=1, #v, counts[1+v[i]-lower]++);
        vector(#counts, i, [i-1, counts[i]])
    };
    
    V1 = [0, 1, 2, 2, 3, 4, 6, 8, 8, 8];
    vecsort(counts_for(V1, 0, 8), [2], 4)[1][1]
    > 8
    
    V2 = [0, 0, 0, 0, 3, 3, 5];
    vecsort(counts_for(V2, 0, 5), [2], 4)[1][1]
    > 0
    

    You also can implement the following short-cut for the sake of convenience:

    counts_for1(v: t_VEC) = {
        counts_for(v, vecmin(v), vecmax(v))
    };
    
    most_frequent(v: t_VEC) = {
        my(counts=counts_for1(v));
        vecsort(counts, [2], 4)[1][1]
    };
    
    most_frequent(V1)
    > 8
    
    most_frequent(V2)
    > 0