Search code examples
cprobabilitycs50matchmaking

Matchmaking program in C?


The problem I am given is the following:

Write a program to discover the answer to this puzzle:"Let's say men and women are paid equally (from the same uniform distribution). If women date randomly and marry the first man with a higher salary, what fraction of the population will get married?"

From this site

My issue is that it seems that the percent married figure I am getting is wrong. Another poster asked this same question on the programmers exchange before, and the percentage getting married should be ~68%. However, I am getting closer to 75% (with a lot of variance). If anyone can take a look and let me know where I went wrong, I would be very grateful.

I realize, looking at the other question that was on the programmers exchange, that this is not the most efficient way to solve the problem. However, I would like to solve the problem in this manner before using more efficient approaches.

My code is below, the bulk of the problem is "solved" in the test function:

#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define ARRAY_SIZE 100
#define MARRIED 1
#define SINGLE 0
#define MAX_SALARY 1000000

bool arrayContains(int* array, int val);
int test();

int main()
{
    printf("Trial count: ");
    int trials = GetInt();

    int sum = 0;
    for(int i = 0; i < trials; i++)
    {
        sum += test();
    }

    int average = (sum/trials) * 100;

    printf("Approximately %d %% of the population will get married\n", average / ARRAY_SIZE);
}

int test()
{
    srand(time(NULL));

    int femArray[ARRAY_SIZE][2];    
    int maleArray[ARRAY_SIZE][2];

    // load up random numbers   
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        femArray[i][0] = (rand() % MAX_SALARY);
        femArray[i][1] = SINGLE;

        maleArray[i][0] = (rand() % MAX_SALARY);
        maleArray[i][1] = SINGLE;
    }

    srand(time(NULL));
    int singleFemales = 0;

    for (int k = 0; k < ARRAY_SIZE; k++)
    {
        int searches = 0; // count the unsuccessful matches
        int checkedMates[ARRAY_SIZE] = {[0 ... ARRAY_SIZE - 1] = ARRAY_SIZE + 1};

        while(true)
        {
            // ARRAY_SIZE - k is number of available people, subtract searches for people left
            // checked all possible mates
            if(((ARRAY_SIZE - k) - searches) == 0)
            {
                singleFemales++;
                break;
            }

            int randMale = rand() % ARRAY_SIZE; // find a random male

            while(arrayContains(checkedMates, randMale)) // ensure that the male was not checked earlier
            {
                randMale = rand() % ARRAY_SIZE;               
            }
            checkedMates[searches] = randMale;

            // male has a greater income and is single            
            if((femArray[k][0] < maleArray[randMale][0]) && (maleArray[randMale][1] == SINGLE))
            {
                femArray[k][1] = MARRIED;
                maleArray[randMale][1] = MARRIED;
                break;
            }
            else
            {
                searches++;
                continue;
            }
        }
    }

    return ARRAY_SIZE - singleFemales;
}

bool arrayContains(int* array, int val)
{
    for(int i = 0; i < ARRAY_SIZE; i++)
    {
        if (array[i] == val)
            return true;
    }
    return false;
}

Solution

  • In the first place, there is some ambiguity in the problem as to what it means for the women to "date randomly". There are at least two plausible interpretations:

    1. You cycle through the unmarried women, with each one randomly drawing one of the unmarried men and deciding, based on salary, whether to marry. On each pass through the available women, this probably results in some available men being dated by multiple women, and others being dated by none.

    2. You divide each trial into rounds. In each round, you randomly shuffle the unmarried men among the unmarried women, so that each unmarried man dates exactly one unmarried woman.

    In either case, you must repeat the matching until there are no more matches possible, which occurs when the maximum salary among eligible men is less than or equal to the minimum salary among eligible women.

    In my tests, the two interpretations produced slightly different statistics: about 69.5% married using interpretation 1, and about 67.6% using interpretation 2. 100 trials of 100 potential couples each was enough to produce fairly low variance between runs. In the common (non-statistical) sense of the term, for example, the results from one set of 10 runs varied between 67.13% and 68.27%.

    You appear not to take either of those interpretations, however. If I'm reading your code correctly, you go through the women exactly once, and for each one you keep drawing random men until either you find one that that woman can marry or you have tested every one. It should be clear that this yields a greater chance for women early in the list to be married, and that order-based bias will at minimum increase the variance of your results. I find it plausible that it also exerts a net bias toward more marriages, but I don't have a good argument in support.

    Additionally, as I wrote in comments, you introduce some bias through the way you select random integers. The rand() function returns an int between 0 and RAND_MAX, inclusive, for RAND_MAX + 1 possible values. For the sake of argument, let's suppose those values are uniformly distributed over that range. If you use the % operator to shrink the range of the result to N possible values, then that result is still uniformly distributed only if N evenly divides RAND_MAX + 1, because otherwise more rand() results map to some values than map to others. In fact, this applies to any strictly mathematical transformation you might think of to narrow the range of the rand() results.

    For the salaries, I don't see why you even bother to map them to a restricted range. RAND_MAX is as good a maximum salary as any other; the statistics gleaned from the simulation don't depend on the range of salaries; but only on their uniform distribution.

    For selecting random indices into your arrays, however, either for drawing men or for shuffling, you do need a restricted range, so you do need to take care. The best way to reduce bias in this case is to force the random numbers drawn to come from a range that is evenly divisible by the number of options by re-drawing as many times as necessary to ensure it:

    /*
     * Returns a random `int` in the half-open interval [0, upper_bound).
     * upper_bound must be positive, and should not exceed RAND_MAX + 1.
     */
    int random_draw(int upper_bound) {
        /* integer division truncates the remainder: */
        int rand_bound = (RAND_MAX / upper_bound) * upper_bound;
    
        for (;;) {
            int r = rand();
    
            if (r < rand_bound) {
                return r % upper_bound;
            }
        }
    }