Search code examples
javascriptnode.jsstreamfsreadline

Count Duplicate Lines from File using node.js


I have to read a large .csv file line by line, then take first column from a file which are countries and count duplicates. for example if file contains:

USA
UK
USA

output should be :

USA - 2
UK -1

code:

const fs = require('fs')
const readline = require('readline')

const file = readline.createInterface({
    input: fs.createReadStream('file.csv'),
    output: process.stdout,
    terminal: false
})

file.on('line', line => {
    const country = line.split(",", 1)
    const number = ??? // don't know how to check duplicates
    const result = country + number

    if(lineCount >= 1 && country != `""`) {
        console.log(result)
    }
    lineCount++
})

Solution

  • So for starters, Array.prototype.split returns an array, you seem to want the first value from the array when you split it since you limit it to one. You can read about it here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

    Next you can create a map of all of the countries, and store the amount of times they were seen, and then log the results when the file has finished being read

    
    const countries = {}
    let lineCount = 0
    file.on('line', line => {
        // Destructure the array and grab the first value
        const [country] = line.split(",", 1)
        // Calling trim on the country should remove outer white space
        if (lineCount >= 1 && country.trim() !== "") {
            // If the country is not in the map, then store it
            if (!countries[country]) {
                countries[country] = 1
            } else {
                countries[country]++
            }
        }
        lineCount++
    })
    
    // Add another event listener for when the file has finished being read
    // You may access the country data here, since this callback function
    // won't be called till the file has been read
    // https://nodejs.org/api/readline.html#event-close
    file.on('close', () => {
        for (const country in countries) {
            console.log(`${country} - ${countries[country]}`)
        }
    })