Search code examples
mapreducebigdatacloudamazon-emrpseudocode

Pseudo code to find number of occurrence of characters in a documents


I am trying to write a Pseudo-Code for a MapReduce technique where I need to find the number of occurrence of characters in the document. For example:

m: 1000 times, M: 5000 times, "": 3000 times, \n: 100 times, .:20000 times etc.

Can someone please let me know if this is this correct or I can make it better?

I have written the Pseudo-Code as shown below:

def Map(documentName, documentContent)
For Character in documentContent
  EmitIntermediate(Character, 1)


def Reduce(Character, Counts)
Char_Count = 0
For count in Counts
   Char_Count += count
Emit(Character,Char_Count)

I referred some of the online available Pseudo-Code for map-reduce technique and wrote this one. For example, they have used to the following Pseudo-Code to find the number of occurrence of the word in a document:

def map(documentName, documentContent):
for line in documentContent:
  words = line.split(" ")
  for word in words:
    EmitIntermediate(word, 1)

def reduce(word, counts):
wordCount = 0
for count in counts:
  wordCount += count
Emit(word, wordCount)

Solution

  • def Map(documentName, documentContent)
    For line in documentContent
      Line_String = line
      For Charcter in Line_String
      EmitIntermediate(Character, 1)
    
    
    def Reduce(Character, Counts)
    Char_Count = 0
    For count in Counts
       Char_Count += count
    Emit(Character,Char_Count)