I have the following dataset:
Name Address Bank_Account Ph_NO IP_Address Chargeoff
AJ 12 ABC Street 1234 369 12.12.34 0
CK 12 ABC Street 1234 450 12.12.34 1
DN 15 JMP Street 3431 569 13.8.09 1
MO 39 link street 8421 450 05.67.89 1
LN 12 ABC Street 1234 340 14.75.06 1
ST 15 JMP Street 8421 569 13.8.09 0`
Using this dataset I want to create the below view in SAS:
Name CountOFAddr CountBankacct CountofPhone CountOfIP CountCharegeoff
AJ 3 3 1 2 2
CK 3 3 2 2 3
DN 2 1 2 2 1
MO 1 2 2 1 2
LN 3 3 1 1 2
ST 2 2 2 2 2
The output variables indicates as follows :
-CountOfAddr : For AJ countOFAddr is 3 which means that AJ Shares its address with itself, CK and LN
-CountBankAcct : For MO count of BankAcct is 2 which means that MO Shares its bank account number with itself and ST.Similarly for variables CountofPhone and CountOfIP.
-CountChargeoff: This one is a little tricky it basically implies that AJ is Linked to CK And LN through address...and both CK and LN have been charged off so the countChargeoff for AJ is 2.
For CK
the countChargeOff
is 3
because it is linked with itself, MO
through Bank Account, and LN/AJ
through street address...so total chargeoff
in CK's
Network is 3
(CO count of AJ+CO
count of CK+CO
Count of MO+CO
count of LN
)
I currently work as a Risk Analyst in a Financial Service Firm and the code for this problem may help us to significantly reduce funding of fraudulent accounts.
Thanks.
SELECT
Name,
(SELECT Count(Address)
FROM dataset d2
WHERE d1.Address = d2.Address
) CountOFAddr,
(SELECT Count(Bank_Account)
FROM dataset d2
WHERE d1.Bank_Account = d2.Bank_Account
) CountBankacct,
(SELECT Count(Ph_NO)
FROM dataset d2
WHERE d1.Ph_NO = d2.Ph_NO
) CountofPhone,
(SELECT Count(IP_Address)
FROM dataset d2
WHERE d1.IP_Address = d2.IP_Address
) CountOfIP,
(SELECT count(d2.Chargeoff)
FROM dataset d2
WHERE d1.name <> d2.name
and ( d1.Address = d2.Address
or d1.Bank_Account = d2.Bank_Account
or d1.Ph_NO = d2.Ph_NO
or d1.IP_Address = d2.IP_Address
)
) CountCharegeoff
FROM dataset d1
I Include the charge off calculation.
Bring all d2 <> d1.name
where have any field in common. Then count that.