Search code examples
statadata-cleaning

Counting number of prior observation excluding those belonging to a certain group


I have loan level data which has the following structure and want to create the variable Number

Loan  Borrower Lender    Date       Crop     Country  Number
 1       A       X     01/01/20     Coffee     USA      0
 2       B       X     01/02/20     Coffee     USA      0
 3       C       X     01/03/20     Coffee     USA      0
 4       D       X     01/04/20     Coffee     USA      0
 5       E       X     01/05/20     Banana     USA      4
 6       F       X     01/06/20     Banana     USA      4
 7       G       X     01/07/20     Coffee     USA      2
 8       H       X     01/08/20     Orange     USA      7
 9       I       X     01/09/20     Coffee     USA      3
 .       .       .        .            .        .       .
 .       .       .        .            .        .       .

I want to number my loan based on this set of rules

  1. How many loans has the lender issued up to this point (including this loan)
  2. This number should only include loans in the same country as my loan
  3. This number should exclude all loans given out in the same crop

Hence I am left with a number for each observation which states the number of loans given out by the lender in the same country as said loan but excluding those observations in the country which also occur in the same crop.

So far I tried running:

bysort Lender Country (Date): gen var = _n 

The problem with this is that I don't subtract the observations which occur in the same crop.


Solution

  • * Example generated by -dataex-. 
    clear
    input byte Loan str8 Borrower str6 Lender float Date str6 Crop str7 Country byte Number
    1 "A" "X" 21915 "Coffee" "USA" 0
    2 "B" "X" 21916 "Coffee" "USA" 0
    3 "C" "X" 21917 "Coffee" "USA" 0
    4 "D" "X" 21918 "Coffee" "USA" 0
    5 "E" "X" 21919 "Banana" "USA" 4
    6 "F" "X" 21920 "Banana" "USA" 4
    7 "G" "X" 21921 "Coffee" "USA" 2
    8 "H" "X" 21922 "Orange" "USA" 7
    9 "I" "X" 21923 "Coffee" "USA" 3
    end
    format %td Date
    
    bysort Crop (Date) : gen this = _n
    bysort Crop Date (this): replace this = this[_N] 
    sort Loan 
    gen wanted1 = _n - this 
    
    bysort Country (Date) : replace this = _n
    bysort Country Date (this): replace this = this[_N] 
    sort Loan 
    gen wanted2 = _n - this 
    
    list 
    
         +---------------------------------------------------------------------------------------------+
         | Loan   Borrower   Lender        Date     Crop   Country   Number   this   wanted1   wanted2 |
         |---------------------------------------------------------------------------------------------|
      1. |    1          A        X   01jan2020   Coffee       USA        0      1         0         0 |
      2. |    2          B        X   02jan2020   Coffee       USA        0      2         0         0 |
      3. |    3          C        X   03jan2020   Coffee       USA        0      3         0         0 |
      4. |    4          D        X   04jan2020   Coffee       USA        0      4         0         0 |
      5. |    5          E        X   05jan2020   Banana       USA        4      5         4         0 |
         |---------------------------------------------------------------------------------------------|
      6. |    6          F        X   06jan2020   Banana       USA        4      6         4         0 |
      7. |    7          G        X   07jan2020   Coffee       USA        2      7         2         0 |
      8. |    8          H        X   08jan2020   Orange       USA        7      8         7         0 |
      9. |    9          I        X   09jan2020   Coffee       USA        3      9         3         0 |
         +---------------------------------------------------------------------------------------------+