Search code examples
stata

Creating a variable that uses vertical lookup in another variable


How could you create a variable that would indicate presence or non-presence (0/1) of a certain condition by looking through multiple observations in another variable? (Hence this vertical lookup is to be bounded by the time period.)

For example, in the sample code below, a new variable tag_prodcateg1 should equal to 1 for all categories (1, 2, and 3) only if category 1 was on display in a corresponding week.

clear
input str2 product  prodcategory    promo week tag_prodcateg1
A 1 1 1 1
B 1 0 1 1  
C 2 1 1 1
D 2 1 1 1
E 3 0 1 1
F 3 0 1 1
A 1 0 2 0
B 1 0 2 0 
C 2 1 2 0
D 2 1 2 0
E 3 0 2 0
F 3 0 2 0
X 4 1 3 .
Y 4 1 3 .

end

I am only able to go as far as doing it for all observations at once (i.e. not on a week-by-week basis):

egen uniqueid=group(prodcategory week)
bysort uniqueid: egen sumdisp=total(promo) if prodcategory==1
gen tagsubcatweekondisp=1 if sumdisp>0
replace tagsubcatweekondisp=0 if tagsubcatweekondisp==.

Solution

  • This seems to do the job.

    generate pc1p = prodcategory==1 & promo==1
    bysort week (product): egen tag1 = max(pc1p)
    

    which yields the following results

    . list, clean noobs
    
        product   prodca~y   promo   week   tag_pr~1   pc1p   tag1  
              A          1       1      1          1      1      1  
              B          1       0      1          1      0      1  
              C          2       1      1          1      0      1  
              D          2       1      1          1      0      1  
              E          3       0      1          1      0      1  
              F          3       0      1          1      0      1  
              A          1       0      2          0      0      0  
              B          1       0      2          0      0      0  
              C          2       1      2          0      0      0  
              D          2       1      2          0      0      0  
              E          3       0      2          0      0      0  
              F          3       0      2          0      0      0  
    

    The code above addresses the original problem: create a variable that is 1 for all observations in a given week if one of the observations was for productcategory = 1 and had promo = 1. Now the problem has been updated to include that the variable needs to be 2, rather than 0, if there were no observations for productcategory = 1 in the given week.

    The code above shows general technique for copying a result from one observation to other related observations. To solve the new problem, the same technique will apply. The code below does what I understand is wanted. But the lesson is that the same technique that allowed the solving of the original problem can be understood and used to solve further, essentially identical, problems.

    generate pc1p = prodcategory==1 & promo==1
    bysort week (product): egen tag1 = max(pc1p)
    generate pc2 = prodcategory==1
    bysort week (product): egen tag2 = max(pc2)
    replace tag1 = 2 if tag1==0 & tag2==0
    list, clean noobs
    

    which yields the following results, with tag1 now taking the value 2 for observations in week 3, when productcategory 1 does not appear.

    . list, clean noobs
    
    product   prodca~y   promo   week   tag_pr~1   pc1p   tag1   pc2   tag2  
          A          1       1      1          1      1      1     1      1  
          B          1       0      1          1      0      1     1      1  
          C          2       1      1          1      0      1     0      1  
          D          2       1      1          1      0      1     0      1  
          E          3       0      1          1      0      1     0      1  
          F          3       0      1          1      0      1     0      1  
          A          1       0      2          0      0      0     1      1  
          B          1       0      2          0      0      0     1      1  
          C          2       1      2          0      0      0     0      1  
          D          2       1      2          0      0      0     0      1  
          E          3       0      2          0      0      0     0      1  
          F          3       0      2          0      0      0     0      1  
          X          4       1      3          .      0      2     0      0  
          Y          4       1      3          .      0      2     0      0