Search code examples
loopsif-statementsasnested-loops

Iterate through two datasets to create distinct results dataset


In SAS, I have the following two datasets:

Dataset #1: Data on people's meal preferences

   ID |  Meal   | Meal_rank
    1   Lobster       1
    1   Cake          2
    1   Hot Dog       3
    1   Salad         4
    1   Fries         5
    2   Burger        1
    2   Hot Dog       2
    2   Pizza         3
    2   Fries         4
    3   Hot Dog       1
    3   Salad         2
    3   Soup          3
    4   Lobster       1
    4   Hot Dog       2
    4   Burger        3

Dataset #2: Data on meal availability

  Meal   | Units_available
  Hot Dog     2
  Burger      1
  Pizza       2

In SAS, I'd like to find a way to derive a result dataset that looks as follows (without changing anything in Dataset #1 or #2):

   ID |  Assigned_Meal
    1   Hot Dog
    2   Burger
    3   Hot Dog
    4   Meal cannot be assigned (out of stock/unavailable)

The results are driven by a process that iterates through the meals of each person (identified by their 'ID' values) until either:

  1. A meal is found where there are enough units available.
  2. All meals have been checked against the availability data.

Notably:

  1. There are cases where the person lists a meal that isn't available.

The dataset I'm working with is much larger than in this example (thousands of rows).

Here is SAS code for creating the two sample datasets:

    proc sql;
       create table work.ppl_meal_pref
           (ID char(4),
            Meal char(20),
            Meal_rank num);

    insert into work.ppl_meal_pref
        values('1','Lobster',1)
        values('1','Cake',2)
        values('1','Hot Dog',3)
        values('1','Salad',4)
        values('1','Fries',5)
        values('2','Burger',1)
        values('2','Hot Dog',2)
        values('2','Pizza',3)
        values('2','Fries',4)
        values('3','Hot Dog',1)
        values('3','Salad',2)
        values('3','Soup',3)
        values('4','Lobster',1)
        values('4','Hot Dog',2)
        values('4','Burger',3)
        ;
    quit;
    run;

    proc sql;
       create table work.lunch_menu
           (FoodName char(14),
            Units_available num);

    insert into work.lunch_menu
        values('Hot Dog',2)
        values('Burger',1)
        values('Pizza',1)
        ;
    quit;
    run;

I've tried to implement loops to perform this task, but to no avail (see below).

        data work.assign_meals;

    length FoodName $ 14 Units_available 8;
    if (_n_ = 1) then do;
        declare hash lookup(dataset:'work.lunch_menu', duplicate: 'error', ordered: 'ascending', multidata: 'NO');
            lookup.defineKey('FoodName');
            lookup.defineData('Units_available');
            lookup.defineDone();
    end;

    do until (eof_pref);
        set work.ppl_meal_pref END = eof_pref;
        rc = lookup.FIND();
        IF rc ne 0 THEN DO;
            Units_available = 0;
        end;
    output;
    end;
    stop;
    run;

Solution

  • Here is a working hash based code using the sample data from ealfons1. Having different variable names for the key (Meal versus FoodName) mean you have to use extra syntax in the FIND() (or you could rename in the SET or DATASET specifiers)

    It will also output an updated stock level dataset. Tracking the not assigned condition, i.e. what preferences were run out / not stocked for each ID who did not get a meal assignment, would require extra code and output data.

    data meal_assignments;
      if 0 then set meals_stock; * prep PDV;
      declare hash stock (dataset:'meals_stock');
      stock.defineKey('FoodName');
      stock.defineData('FoodName', 'Units_available');
      stock.defineDone();
    
      do until (lastrow_flag);
        assigned = 0;
        stocked = 0;
        do until (last.ID);
          set ppl_meal_pref end=lastrow_flag;
          by ID Meal_rank; * error will happen if meal_rank is not monotonic;
          if assigned then continue; * alread assigned;
          if stock.find(key:Meal) ne 0 then continue; * off the menu;
          stocked = 1;
          if Units_available <  1 then continue; * out of stock or missing count;
          Units_available + (-1);
          if stock.replace() = 0 then do; * hash replace worked;
            assigned = 1;
            OUTPUT;
          end;
          else put 'WARNING: Problem with stock hash ' Meal=;
        end;
        if not assigned then do;
          if stocked then Meal = 'Ran out'; else Meal = 'Not stocked';
          OUTPUT;
        end;
      end;
    
      keep ID Meal;
    
      stock.output(dataset:'meals_stock_after_assignments');
    
      stop;
    run;
    
    options nocenter;
    title "Meals report";
    proc print noobs data=meal_assignments; title2 "Assignments";
    proc print noobs data=meals_stock_after_assignments; title2 "New stock levels";
    proc sql;
      title2 "Usage summary";
      select A.Meal, A.have_count, B.had_count, B.had_count - A.have_count as use_count
      from 
      (select FoodName as Meal, Units_available as have_count from meals_stock_after_assignments) as A
      join 
      (select FoodName as Meal, Units_available as had_count from meals_stock) as B
      on A.Meal = B.Meal
      ;
    quit;
    

    The 'want' here is queue based:

    • first come, first served by preference rank solution.
      • a random queue order over ID could deliver a modicum of perceived 'fairness'

    More difficult solutions would be based on global planning, such as:

    • serve most people, highest preference rank
    • serve most people, lowest cost
    • etc ...