Search code examples
javaarray-algorithms

Best way to find all elements in a list that are present more than k times


I just came across a problem and I was wondering what would be the best way to solve this.

I have a List of Lists

L = [[1, 2, 3, 4, 5, 6, 7], [2, 4, 6, 8, 10, 12], [3, 6, 9, 12, 15], ....]

Assuming the size of L is n, what would be the best way to find all the elements that are present k or more times in L?

For example, if k = 2, then I should get [2, 3, 4, 6, 12].


Solution

  • Assuming the size of L is n, what would be the best way to find all the elements that are present k or more times in L?

    Traditional way is to iterate through each list one time and collect times values in HashMap<Integer, Integer> (where key is a number, value is times). Then you need just to collect all the keys from map which values are k or more:

     public static List<Integer> getResultListByMap(List<List<Integer>> inputList, int k) {
        Map<Integer, Integer> times = new HashMap<>();
        for (List<Integer> integers : inputList) {
            for (Integer integer : integers) {
                if (times.keySet().contains(integer)) {
                    times.put(integer, times.get(integer) + 1);
                } else {
                    times.put(integer, 1);
                }
            }
        }
    
        List<Integer> result = new ArrayList<>();
        for (Map.Entry<Integer, Integer> entry : times.entrySet()) {
            if (entry.getValue() >= k) {
                result.add(entry.getKey());
            }
        }
        return result;
    }
    

    result list contains all the numbers which are presented in lists k or more times

    UPDATE: OK, I've got that you use HashMap approach already and it is slow for you. I wrote an algorithm with Java 8 Stream API features which uses lists concatenation, sorting and gains bonuses from parallelism:

    public static List<Integer> getResultListBySort(List<List<Integer>> inputList, int k) {
        List<Integer> newList = inputList.parallelStream()
                .flatMap(l -> l.parallelStream()).sorted().collect(Collectors.toList());
    
        List<Integer> result = new ArrayList<>();
    
        Integer prev = null;
        int sum = newList.get(0);
        for (Integer integer : newList) {
            if (integer.equals(prev)) {
                sum++;
            } else {
                if (sum >= k) {
                    result.add(integer);
                }
                sum = 1;
            }
            prev = integer;
        }
        return result;
    }
    

    It is twice as fast for 2000 x 2000 problem size - 2000 lists with 2000 elements (now it takes only half a second to get the result list on my PC)

    Benchmark                       Mode  Samples  Score  Score error  Units
    c.c.b.MyBenchmark.testMap       avgt       20  0,972        0,030   s/op
    c.c.b.MyBenchmark.testSorted    avgt       20  0,534        0,005   s/op