Search code examples
pythoncountrangehistogrambinning

In Python, what would be a clear, efficient way to count things in regions?


I am looping over objects called events. Each event has a particular object in it. I am calculating the fraction of objects that have a particular characteristic. Imagine the approach as being something like the following:

for event in events:
    countCars =+ 1
    if event.car.isBlue() is True:
        countCarsBlue =+ 1

print("fraction of cars that are blue: {fraction}".format(
    fraction = countCarsBlue / countCars))

Now, imagine that I want to calculate the fraction of objects that have the particular characteristic in regions of another of the object's characteristics. So, in my example, I am counting the fraction of cars that are blue. Now, I want to calculate the fraction of cars that are blue in the range of car lengths from 0 m to 1 m, the fraction of cars that are blue in the range of car lengths from 1 m to 2 m, from 2 m to 3 m and from 3 m to 4 m and so on.

Given that I am dealing with a lot of statistics and many more bins than the 4 bins of my simple example, what would be a good way to structure the code for this type of calculation, assuming a constant bin width?

(Would there be a sensible way to do this for variable bin widths?)


Solution

  • First, some code to recreate your example:

    import random
    
    class Event(object):
        def __init__(self):
            self.car = None
    
    class Car(object):
        def __init__(self, isBlue, length):
            self._isBlue = isBlue
            self._length = length
    
        def isBlue(self):
            return self._isBlue
    
        def length(self):
            return self._length
    
        def __str__(self):
            return '{} car of {} m long.'.format('blue' if self.isBlue() else 'non-blue ', self.length())
    

    OK, now I randomly create ten carobjects and add them to an event:

    totalNumberOfCars = 10
    events = []
    for _ in range(totalNumberOfCars):
        car = Car(random.choice([True, False]), random.randrange(5, 40)/10.)
        print car
        event = Event()
        event.car = car
        events.append(event)
    

    For me, the output was as follows (your output can of course be different):

    blue car of 0.5 m long.
    non-blue  car of 2.3 m long.
    non-blue  car of 3.8 m long.
    blue car of 2.1 m long.
    non-blue  car of 0.6 m long.
    blue car of 0.8 m long.
    blue car of 0.5 m long.
    blue car of 2.3 m long.
    blue car of 3.3 m long.
    blue car of 2.1 m long.
    

    Now, if we want to count our events by region, you could do it as follows:

    allBlueCars = sum(1 for event in events if event.car.isBlue())
    print "Number of blue cars: {}".format(allBlueCars)
    
    maxCarLen = 4
    for region in zip(range(maxCarLen ), range(1, maxCarLen +1)):
        minlen, maxlen = region
        print "Cars between {} and {} m that are blue:".format(minlen, maxlen)
        blueCarsInRegion = [str(event.car) for event in events if event.car.isBlue() and minlen <= event.car.length() < maxlen]
        if blueCarsInRegion:
            print '\n'.join(['\t{}'.format(car) for car in blueCarsInRegion])
        else:
            print 'no blue cars in this region'
        fraction = float(len(blueCarsInRegion)) / allBlueCars
        print "fraction of cars that are blue and between {} and {} m long: {}".format(minlen, maxlen, fraction)
        print
    

    For the above sample data, that would print:

    Number of blue cars: 7
    Cars between 0 and 1 m that are blue:
        blue car of 0.5 m long.
        blue car of 0.8 m long.
        blue car of 0.5 m long.
    fraction of cars that are blue and between 0 and 1 m long: 0.428571428571
    
    Cars between 1 and 2 m that are blue:
    no blue cars in this region
    fraction of cars that are blue and between 1 and 2 m long: 0.0
    
    Cars between 2 and 3 m that are blue:
        blue car of 2.1 m long.
        blue car of 2.3 m long.
        blue car of 2.1 m long.
    fraction of cars that are blue and between 2 and 3 m long: 0.428571428571
    
    Cars between 3 and 4 m that are blue:
        blue car of 3.3 m long.
    fraction of cars that are blue and between 3 and 4 m long: 0.142857142857