Search code examples
javascriptarraysstringbag

Simple way to count occurrences in array and get top values (bag of words)


Hi iv been looking around for a way to develop a simple bag of words type model in javascript and have spent time looking around at some examples, however most require jnode or browserify to be installed from what i have seen. I am trying to simply read text, split it up, and get the most frequently used words in the text, however im having issues using javascript's array object to return the text value, so far i can only return the numbered index:

function bagOfWords(text){
text=text.toLowerCase(); //make everything lower case
var bag = text.split(" "); //remove blanks

//count duplicates 
var map = bag.reduce(function(prev, cur) {
  prev[cur] = (prev[cur] || 0) + 1;
  return prev;
}, {});


var arr = Object.keys( map ).map(function ( key ) { return map[key]; }); //index based on values to find top 10 possible tags
arr=arr.sort(sortNumber); //sort the numbered array

var top10 = new Array(); //the final array storing the top 10 elements
for (i = arr.length; top10.length < 10; i--) { 
if(top10.length<10){
top10.push(arr[i]);}

}

}

Is there a simpler way using the reduce method to find, count and search the top 10 words using the reduce method without having to iterate the index's and referencing the original text input (without creating new sorted arrays)?


Solution

  • I don't know if there's a good reduce solution to the problem, but I've come up with an algorithm:

    1. Sort all the words, and clone this array.
    2. Sort the sorted list of words in reverse order of occurrence, using lastIndexOf() and indexOf() on the cloned array.
    3. filter() the new array to remove duplicates.
    4. slice() the filtered array to limit it to the first 10 words.

    Snippet:

    function bagOfWords(text) {
      var bag = text.
                  toLowerCase().
                  split(' ').
                  sort(),
          clone = bag.slice();  //needed because sort changes the array in place
              
      return bag.
               sort(function(a, b) { //sort in reverse order of occurrence
        	     return (clone.lastIndexOf(b) - clone.indexOf(b) + 1) -
         	            (clone.lastIndexOf(a) - clone.indexOf(a) + 1);
        	   }).
               filter(function(word, idx) { //remove duplicates
                 return bag.indexOf(word) === idx;
               }).
               slice(0, 10);  //first 10 elements
    } //bagOfWords
    
    console.log(bagOfWords('four eleven two eleven ten nine one six seven eleven nine ten seven four seven six eleven nine five ten seven six eleven nine seven three five ten eleven six nine two five seven ten eleven nine six three eight eight eleven nine ten eight three eight five eleven eight ten nine four four eight eleven ten five eight six seven eight nine ten ten eleven '));
    
    console.log(bagOfWords('Four score and seven years ago our fathers brought forth on this continent a new nation conceived in Liberty and dedicated to the proposition that all men are created equal Now we are engaged in a great civil war testing whether that nation or any nation so conceived and so dedicated can long endure We are met on a great battle-field of that war We have come to dedicate a portion of that field as a final resting place for those who here gave their lives that that nation might live It is altogether fitting and proper that we should do this But in a larger sense we can not dedicate we can not consecrate we can not hallow this ground The brave men living and dead who struggled here have consecrated it far above our poor power to add or detract The world will little note nor long remember what we say here but it can never forget what they did here It is for us the living rather to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced It is rather for us to be here dedicated to the great task remaining before us that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion that we here highly resolve that these dead shall not have died in vain that this nation under God shall have a new birth of freedom and that government of the people by the people for the people shall not perish from the earth'));