Search code examples
sqlsasprocdo-loops

PROC SQL SAS PROGRAMMING


I have the following dataset:

Name  Address         Bank_Account  Ph_NO    IP_Address   Chargeoff
AJ    12 ABC Street     1234        369      12.12.34         0
CK    12 ABC Street     1234        450      12.12.34         1
DN    15 JMP Street     3431        569      13.8.09          1
MO    39 link street    8421        450      05.67.89         1
LN    12 ABC Street     1234        340      14.75.06         1
ST    15 JMP Street     8421        569      13.8.09          0`

Using this dataset I want to create the below view in SAS:

 Name   CountOFAddr CountBankacct CountofPhone CountOfIP CountCharegeoff
  AJ       3             3           1            2             2
  CK       3             3           2            2             3
  DN       2             1           2            2             1
  MO       1             2           2            1             2
  LN       3             3           1            1             2
  ST       2             2           2            2             2

The output variables indicates as follows :

-CountOfAddr : For AJ countOFAddr is 3 which means that AJ Shares its address with itself, CK and LN

-CountBankAcct : For MO count of BankAcct is 2 which means that MO Shares its bank account number with itself and ST.Similarly for variables CountofPhone and CountOfIP.

-CountChargeoff: This one is a little tricky it basically implies that AJ is Linked to CK And LN through address...and both CK and LN have been charged off so the countChargeoff for AJ is 2.

For CK the countChargeOff is 3 because it is linked with itself, MO through Bank Account, and LN/AJ through street address...so total chargeoff in CK's Network is 3(CO count of AJ+CO count of CK+CO Count of MO+CO count of LN)

I currently work as a Risk Analyst in a Financial Service Firm and the code for this problem may help us to significantly reduce funding of fraudulent accounts.

Thanks.


Solution

  • SQL Fiddle Demo

    SELECT 
        Name,
        (SELECT Count(Address)
         FROM dataset d2
         WHERE d1.Address = d2.Address 
        ) CountOFAddr,
    
        (SELECT Count(Bank_Account)
         FROM dataset d2
         WHERE d1.Bank_Account = d2.Bank_Account
        ) CountBankacct,
    
        (SELECT Count(Ph_NO)
         FROM dataset d2
         WHERE d1.Ph_NO = d2.Ph_NO
        ) CountofPhone,
    
        (SELECT Count(IP_Address)
         FROM dataset d2
         WHERE d1.IP_Address = d2.IP_Address
        ) CountOfIP, 
    
        (SELECT count(d2.Chargeoff) 
         FROM dataset d2
         WHERE  d1.name <> d2.name
           and (   d1.Address = d2.Address
                or d1.Bank_Account = d2.Bank_Account
                or d1.Ph_NO = d2.Ph_NO 
                or d1.IP_Address = d2.IP_Address
               )
        ) CountCharegeoff           
    FROM dataset d1
    

    I Include the charge off calculation.

    Bring all d2 <> d1.name where have any field in common. Then count that.