I'm building a manager class with PHP to manage credit card payment authorizations. With credit cards, we're allowed to keep First6
, last4
, expiration_Month
and expiration_Year
.
I'm really interested in knowing how unique the combination of these 4 variables is and how likely it would be to run into another one.
Depending on how likely it is will effect when to test if we've already got a valid authorization for a new card. If we've already got an authorization for a particular card, there's no need to run the numbers again. Instead, we can find the already authorized card and do a re-authorization. However, I wouldn't want to run the wrong card because it has a similar First6
, last4
, expiration_Month
and expiration_Year
..
My goal is to limit data redundancy of credit card data, hits to the CC processor API and unnecessary authorizations on customer cards.
The First 6 tell you what kind of card you are dealing with. For a list of issuers see:
http://en.wikipedia.org/wiki/List_of_Issuer_Identification_Numbers
The last four are essentially random. The month will be essentially random, and the year will be in a small range from the current year to perhaps 6 years out. The year will exhibit some bias between possible values.
You will almost certainly have collisions if you combine those items to attempt to uniquely identify a card. That is not a reliable thing to do.
EDIT
Here are examples of recent security breeches similar to this scenario
http://blogs.cisco.com/security/6-5-million-password-hashes-suggest-a-possible-breach-at-linkedin/
http://www.infoworld.com/d/security/nvidia-investigating-breach-of-hashed-passwords-197796
https://www.infoworld.com/d/security/passwords-leaked-yahoo-boozy-preachy-angry-and-easy-197696
If a hacker can download data from the database of a large web company (typically the most-firewalled-away part of the architecture), chances are pretty good they can also access the application tier and grab the source code or compiled application that accesses the data layer.