Search code examples
dictionaryfor-loopkotlinhashmapduplicates

How to count the duplicate lines in a file and find the most duplicated line?


Or even better, to show me how many times was certain element copied in the map. The map was created like this:

fun prirazovac() {
    var lineNumber = 0

    File("src/60.ips.txt").forEachLine {
        lineNumber++
        val ipcode = mutableMapOf(lineNumber to it)
        for (ii in 1..200) {
            for (i in 200 downTo 1) {
                val truth = (ipcode.get(ii)== ipcode.get(i))
                if (truth) {
                    println(ipcode)
                }
            }

        }
    }
}

60.ips.txt:

66.249.64.33
66.249.64.124
66.249.76.13
66.249.76.11
142.54.183.122
142.54.183.122
180.76.15.162
173.234.153.122
173.234.153.122
173.234.153.122
173.234.153.122
180.76.15.154
180.76.15.33
66.249.76.110
66.249.76.109
46.119.118.233
46.119.118.233
46.119.118.233
207.46.13.231
207.46.13.231
40.77.167.29
52.3.127.144
66.249.64.33
66.249.76.109
63.249.66.212
63.249.66.212
207.46.13.237
207.46.13.237
40.77.167.29
40.77.167.29
157.55.39.251
207.46.13.142
66.249.76.9
40.77.167.7
157.55.39.251
157.55.39.251
157.55.39.251
157.55.39.251
157.55.39.251
207.46.13.142
207.46.13.142
198.204.240.219
198.204.240.219
68.180.231.40
68.180.231.40
66.249.64.124
139.167.180.171
139.167.180.171
52.3.127.144
217.69.133.169
66.249.76.13
131.161.8.209
223.16.201.219
223.16.201.219
68.180.231.40
162.210.196.97
162.210.196.97
106.75.74.148
106.75.74.148
106.75.74.148
137.226.158.12
137.226.158.12
106.75.74.148
106.75.74.148
123.125.71.53
178.255.215.84
178.255.215.84
66.249.76.9
63.249.66.212
63.249.66.212
63.249.66.212
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
142.54.183.122
142.54.183.122
66.249.76.109
151.80.31.167
51.255.65.21
202.46.58.80
84.185.64.239
84.185.64.239
178.255.215.84
178.255.215.84
52.3.127.144
180.76.15.21
66.249.64.20
66.249.76.127
80.112.180.113
66.249.76.109
180.76.15.6
223.16.201.219
223.16.201.219
84.121.51.229
84.121.51.229
123.125.71.79
157.55.39.251
217.69.133.253
217.69.133.252
92.204.106.99
188.251.22.226
80.183.10.116
68.180.228.62
68.180.228.62
173.208.211.250
173.208.211.250
66.249.65.158
180.76.15.6
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
68.180.228.62
180.76.15.6
173.208.211.250
173.208.211.250
5.248.253.78
5.248.253.78
5.248.253.78
123.125.71.95
92.204.106.99
93.95.103.45
52.3.127.144
52.3.127.144
68.180.228.62
163.172.66.14
190.200.185.85
190.200.185.85
157.55.39.251
157.55.39.113
180.76.15.137
180.76.15.25
92.204.106.99
66.249.73.136
46.229.167.149
46.229.167.149
46.229.167.149
92.229.161.46
92.204.106.99
92.204.106.99
92.204.106.99
66.249.65.158
66.249.65.154
207.46.13.141
207.46.13.141
207.46.13.141
173.208.211.250
173.208.211.250
66.249.73.131
66.249.73.131
163.172.14.55
178.255.215.84
91.64.61.78
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
87.78.248.247
87.78.248.247
69.64.40.177
223.16.201.219
223.16.201.219
63.249.66.212
63.249.66.212
178.137.95.202
178.137.95.202
178.137.95.202
92.204.106.99

And it prints out thousands of results. I need them single, and, in best case, showing number how many duplicates of one result was there, for example: ip adress - 20 times. I thought HashMap() would help but it doesn't. Any ideas?


Solution

  • Kotlin has some great functions for this: groupingBy and eachCount that do exactly what you want:

    import java.io.File
    
    fun main() {
        File("src/60.ips.txt")
            .readLines()
            .groupingBy { it }
            .eachCount()
            .forEach { (ip, count) -> println("$ip -> $count times") }
    }
    

    Partial output:

    66.249.64.33 -> 2 times
    66.249.64.124 -> 2 times
    66.249.76.13 -> 2 times
    66.249.76.11 -> 1 times
    142.54.183.122 -> 4 times
    

    To find the most frequent duplicate you can use maxByOrNull:

    File("src/60.ips.txt")
        .readLines()
        .groupingBy { it }
        .eachCount()
        .maxByOrNull { it.value }
        ?.let { (ip, count) -> println("IP $ip appeared the most: $count times") }
    

    Output:

    IP 46.246.39.81 appeared the most: 17 times