I feel like I'm being really stupid here as I would have thought there's a simple command already in Pari, or it should be a simple thing to write up, but I simply cannot figure this out.
Given a vector, say V, which will have duplicate entries, how can one determine what the most common entry is?
For example, say we have:
V = [ 0, 1, 2, 2, 3, 4, 6, 8, 8, 8 ]
I want something which would return the value 8.
I'm aware of things like vecsearch
, but I can't see how that can be tweaked to make this work?
Very closely related to this, I want this result to return the most common non-zero entry, and some vectors I look at will have 0 as the most common entry. Eg: V = [ 0, 0, 0, 0, 3, 3, 5 ]
. So whatever I execute here I would like to return 3.
I tried writing up something which would remove all zero terms, but again struggled.
The thing I have tried in particular is:
rem( v ) = {
my( c );
while( c = vecsearch( v, 0 ); #c, v = vecextract( v, "^c" ) ); v
}
but vecextract
doesn't seem to like this set up.
If you can ensure all the elements are within the some fixed range then it is enough just to do the counting sorting with PARI/GP code like this:
counts_for(v: t_VEC, lower: t_INT, upper: t_INT) = {
my(counts = vector(1+upper-lower));
for(i=1, #v, counts[1+v[i]-lower]++);
vector(#counts, i, [i-1, counts[i]])
};
V1 = [0, 1, 2, 2, 3, 4, 6, 8, 8, 8];
vecsort(counts_for(V1, 0, 8), [2], 4)[1][1]
> 8
V2 = [0, 0, 0, 0, 3, 3, 5];
vecsort(counts_for(V2, 0, 5), [2], 4)[1][1]
> 0
You also can implement the following short-cut for the sake of convenience:
counts_for1(v: t_VEC) = {
counts_for(v, vecmin(v), vecmax(v))
};
most_frequent(v: t_VEC) = {
my(counts=counts_for1(v));
vecsort(counts, [2], 4)[1][1]
};
most_frequent(V1)
> 8
most_frequent(V2)
> 0