Search code examples
c++algorithmsortingvectorfrequency

c++ - sorting a vector of custom structs based on frequency


I need to find the most frequent element in an array of custom structs. There is no custom ID to them just matching properties.

I was thinking of sorting my vector by frequency but I have no clue how to do that.


Solution

  • I'm assuming by frequency you mean the number of times an identical structure appears in the array.

    You probably want to make a hash function (or overload std::hash<> for your type) for your custom struct. Then iterate over your array, incrementing the value on an unordered_map<mytype, int> for every struct in the array. This will give you the frequency in the value field. Something like the below would work:

    std::array<mytype> elements;
    std::unordered_map<mytype, int> freq;
    mytype most_frequent;
    int max_frequency = 0;
    for (const mytype &el : elements) {
        freq[el]++;
        if (freq[el] > max_frequency) {
            most_frequent = el;
        }
    }
    

    For this to work, the map will need to be able to create a hash for the above function. By default, it tries to use std::hash<>. You are expressly allowed by the standard to specialize this template in the standard namespace for your own types. You could do this as follows:

    struct mytype {
        std::string name;
        double value;
    };
    namespace std {
        template <> struct hash<mytype> {
            size_t operator()(const mytype &t) const noexcept {
                // Use standard library hash implementations of member variable types
                return hash<string>()(t.name) ^ hash<double>()(t.value)
            }
        }
    
    }
    

    The primary goal is to ensure that any two variables that do not contain exactly the same values will generate a different hash. The above XORs the results of the standard library's hash function for each type together, which according to Mark Nelson is probably as good as the individual hashing algorithms XOR'd together. An alternative algorithm suggested by cppreference's hash reference is the Fowler-Noll-Vo hash function.