I just came across a problem and I was wondering what would be the best way to solve this.
I have a List of Lists
L = [[1, 2, 3, 4, 5, 6, 7], [2, 4, 6, 8, 10, 12], [3, 6, 9, 12, 15], ....]
Assuming the size of L is n, what would be the best way to find all the elements that are present k or more times in L?
For example, if k = 2
, then I should get
[2, 3, 4, 6, 12]
.
Assuming the size of L is n, what would be the best way to find all the elements that are present k or more times in L?
Traditional way is to iterate through each list one time and collect times values in HashMap<Integer, Integer>
(where key is a number, value is times). Then you need just to collect all the keys from map which values are k
or more:
public static List<Integer> getResultListByMap(List<List<Integer>> inputList, int k) {
Map<Integer, Integer> times = new HashMap<>();
for (List<Integer> integers : inputList) {
for (Integer integer : integers) {
if (times.keySet().contains(integer)) {
times.put(integer, times.get(integer) + 1);
} else {
times.put(integer, 1);
}
}
}
List<Integer> result = new ArrayList<>();
for (Map.Entry<Integer, Integer> entry : times.entrySet()) {
if (entry.getValue() >= k) {
result.add(entry.getKey());
}
}
return result;
}
result
list contains all the numbers which are presented in lists k
or more times
UPDATE: OK, I've got that you use HashMap
approach already and it is slow for you. I wrote an algorithm with Java 8 Stream API features which uses lists concatenation, sorting and gains bonuses from parallelism:
public static List<Integer> getResultListBySort(List<List<Integer>> inputList, int k) {
List<Integer> newList = inputList.parallelStream()
.flatMap(l -> l.parallelStream()).sorted().collect(Collectors.toList());
List<Integer> result = new ArrayList<>();
Integer prev = null;
int sum = newList.get(0);
for (Integer integer : newList) {
if (integer.equals(prev)) {
sum++;
} else {
if (sum >= k) {
result.add(integer);
}
sum = 1;
}
prev = integer;
}
return result;
}
It is twice as fast for 2000 x 2000
problem size - 2000 lists with 2000 elements (now it takes only half a second to get the result list on my PC)
Benchmark Mode Samples Score Score error Units
c.c.b.MyBenchmark.testMap avgt 20 0,972 0,030 s/op
c.c.b.MyBenchmark.testSorted avgt 20 0,534 0,005 s/op