Search code examples
pythonhadoopmapreduce

What is the int needed for in map(int, icount) in Pydoop


In the official Pydoop tutorial there is a word count example.

I understand how it works, but I am wondering about the inner workings of map(int, icounts)).

Do I follow correctly that icounts is a list of 1s? Where does the int come from and why map?

# Compute the word frequency

import pydoop

def mapper(_, text, writer):
    for word in text.split():
        writer.emit(word, "1")

def reducer(word, icounts, writer):
    writer.emit(word, sum(map(int, icounts)))

Solution

  • A type like int can be applied as a function to a value to convert that value to an integer (if it supports that conversion).

    For example:

    s = '1'
    i = int(s)  # `i` will be `1`, where `s` is `'1'`
    

    icounts in your example is likely an iterable (like a list) of string values, and mapping int over it turns that into an iterable of integer values.